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Levels of Processing in Phonological Fusion* 
James Eric Cutting"*" 



ABSTRACT 

Phonological fusion occurs when the phonemes of two different 
speech stimuli are combined into a new percept which is longer and 
linguistically more complex than either of the two inputs. For 
example » when PAY is presented to one ear and LAY to the other, the 
subject often perceives PLAY. The purpose of the present studies was 
to determine whether both higher-level linguistic cues and lower- . 
level nonlinguistic cues were responsible for fusion. Fusible stimuli 
were varied along linguistic and nonlinguistic dimensions in order to 
determine the level at which the information from the two inputs is 
combined into a single percept. 

Fusion occurred independent of wide variations in the nonlinguis- 
tic dimensions of the stimuli. When to-be**fused stimuli were varied 
in their relative onsets by 100 msec or more* fusion still occurred at 
a high rate. Pitch differences of 20 Hz and intensity differences of 
15 db had no effect on fusion rate. Insensitivity to these nonlin- 
guistic stimulus dimensions is a characteristic of higher-level pro- 
cesses. 

Although it was independent of nonlinguistic cues, phonological 
fusion was influenced by many kinds of linguistic cues. At the seman* 
tic level, fusible pairs yielded higher fusion rates when imbedded in 
a sentence context than when presented as isolated pairs. At the 
phoneme level » fusion rates were higher for certain phonemes than for 
others: for example » stop/liquid pairs such as BED/LED fused much 
more readily than fricative/liquid pairs siich as FED/LED. While the 
particular phoneme chosen as the first consonant in a to-be-fused 
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cluster played an Important role in fusion » the second played a less 
clear role. All liquid (/r,l/) and semivowel (/v,y/) stimuli fused 
equally well when paired with an appropriate stop consonant. At the 
acoustic level, specific cues were also important in facilitating 
fusion. For example, the second formant transition of the liquid 
stimulus (or one similar to it) was necessary, but -not entirely suf- 
ficient for fusion to occur. 

Thus, phonological fusion was insensitive to nonlinguistic stim- 
ulus variation, but sensitive to linguistic variation. These findings 
suggest that phonological fusion is a higher- level phenomenon. More- 
over, they lend further support to the view that there are different 
processing mechanisms for linguistic and nonlinguistic dimensions. 

INTRODUCTION 

Most of the dichotic listening literature has dealt with the phenomenon of 
perceptual rivalry . Given a different stimulus presented to each ear at the 
same time, the subject typically reports hearing one or both of the stimuli. 
The different information contained in each stimulus is not combined into a 
single percept. Thus for example, given the dichotic digits ONE/FIVE, the sub- 
ject never reports hearing FUN or WIVE. Perceptual fusion does occur when cer- 
tain variables are taken in account. In several types of fusion phenomena the 
stimulus variables which facilitate fusion appear to be psycholinguistic in 
nature. For example, given the dichotic pair BANKET/LANKET the subject often 
reports hearing BLANKET (Day, 1970a). In this type of fusion segments of both 
stimuli are combined to form a new percept which is longer and linguistically 
"^ore complex than either of the two inputs. 

This phenomenon is called phonological fusion because it conforms to the 
phonological rules of English: given BANKET/LANKET the subject reports hearing 
BLANKET, not LBANKET. According to phonological rules of cluster formation in 
English, initial stop consonant + liquid clusters are allowed but initial liquid 
+ stop clusters are not. Day (1970b) found that when stimuli were paired such 
that these constraints were removed, fusion occurred in both directions. Given 
the stimuli TASS/TACK, for example, the subject reported hearing TASK on some 
trials and TACKS on others. Both /sk/ and /ks/ clusters are permissible in 
final position in English. 

Phonological fusion cannot be explained as a response bias for acceptable 
English words. Day (1968) found that when different productions of BANKET were 
presented to each ear, subjects reported hearing BANKET. That is, they did not 
report hearing the acceptable English word that corresponded most closely to 
the nonword inputs. Likewise, LANKET/LANKET yielded LANKET. Only when the stim- 
uli were BANKET/LANKET did subjects report hearing BLANKET, regardless of which 
stimulus was presented to each ear. 

Fusion also cannot be explained in terms of subjects* expectations, DAY (in 
preparation-a) has informed subjects before the fusion task about the type of 
stimuli they were to hear: amcing other items, some pairs consisted of different 
productions of BLACK (BLACK/ BLACK) , and some consisted of BACK and LACK (BACK/ 
. LACK). Given the opportunity to write down BLACK, BACK, or LACK as a response 
for any trial, subjects typically reported hearing BLACK in both conditions. 



A Levels"of-Processlng Approach to thf! Study of Cognition 



There are two basic experimental approaches to the systematic study of 
process levels In cognition. In one approach the experimenter varies the task 
while holding the stimuli constant. In the second approach the taxperlmenter 
varies the stimuli while holding the task constant. Typically, In the task- 
variation strategy the experimenter requires the subject to process different 
dimensions of the same stimuli. Day and Cutting (1970); Day, Cutting, and 
Copeland (1971); Wood, Goff, and Day (1971); Day and Wood (1972); and Wood (1973) 
have used such an approach. In all these studies stimuli were chosen so that 
in one task subjects were required to process linguistic dimensions of the stim- 
uli, while in another task they were required to process nonllngulstlc dimen- 
sions. In the stlmulus-v^irlatlon approach stimuli are varied along linguistic 
dimensions in some cases nnd nonllngulstlc dimensions in others. The effects of 
the different types of variation are measured in the results of a common task. 
Day and Cutting (1971) and Cutting (1973c) have used this approach for taskis 
involving dlchotlc rivalry. 

The present experiments used the stimulus-variation approach to study dl- 
chotlc phonological fusion. Overall fusion level (or fusion rate) was the pri- 
mary dependent variable. By varying the stimuli in a fusible pair along a par- 
ticular dimension and observing fusion rate for varied and nonvarled fusible 
pairs, the level at which information is combined from the two Inputs can be 
determined. Nonllngulstlc variables, such as timing, pitch, and intensity, are 
considered first; linguistic variables, such as semantic context, the par- 
ticular phonemes to be fused, and the acoustic structure of the stimuli, are 
examined second. In order to assess the Importance of linguistic and nonllnguls- 
tlc parameters stimuli must be carefully controlled. For example, in considering 
the Importance of pitch on fusion rate, care must be taken so that there is no 
uncontrolled variation along another dimension, such as intensity or duration. 
Precise variation along linguistic and nonllngulstlc dimensions is most readily 
achieved through the use of synthetic speech. Therefore, synthetic speech 
stimuli were used in all experiments. Although synthetic speech typically sounds 
somewhat artificial, especially to naive subjects. Cutting (1973a) found that the 
rules that governed the fusibility of synthetic stimuli and natural speech 
stimuli were the same, and that there were no Inherent artifacts in the percep- 
tion of synthetic speech which affected fusion results. 

Types of Auditory Fusion 

Since the 1950' s a number of dlchotlc phenomena have been called fusion by* 
various researchers. Broadbent (1955), Day (1968), and Halwes (1969), among 
others, have described experimental situations in which two auditory signals 
presented separately to each ear were perceived as one. From the titles of their 
papers one would assume that they were concerned with the same process: *'0n the 
fusion of sounds reaching different sense organs" (Broadbent and Ladefoged, 
1957); "Fusion in dlchotlc listening" (Day, 1968); and "Effects of dlchotlc 
fusion on the perception of speech" (Halwes, 1969). 

Fusion, however, is not one phenomenon, but many phenomena which are only 
tenuously related. Subsuming them all under the single label fusion with no 
descriptive adjective easily leads to confusion. Cutting (1972) has described 
six different types of auditory fusion and argued that they can be divided into 
two general groups according to their relative sensitivity to several stimulus 
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parameters. The groups are designated lover- level and higher-level fusions. 
Examples of the stimuli and possible fusion responses for each type are shown In 
Figure 1. 

Lower-level fusions are dependent on nonllngulstlc dimensions of the stim- 
uli. If timing, In terms of the relative onset times of the two die ho tic stim- 
uli, is varied within a very small range fusion disintegrates and two stimuli are 
heard. For lower-level fusions this range is often a matter of microseconds. In 
addition, small differences between the stimuli in pitch (2 Hz) or Intienslty 
(2 db) are often sufficient to eliminate fusion so that the two stimuli are per- 
ceived as separate entities. 

Higher-level fusions are relatively Independent of these stimulus dimen- 
sions. Timing (relative onset time) differences of 25 msec are often insuffi- 
cient to reduce fusion rates. Pitch and intensity may also vary between the two 
stimuli within a much greater range: differences of 20 Hz or 30 db may not 
reduce fusion rate at all. It appears that information in the stimuli — not rela- 
tive onset time, pitch, or intensity — is important to fusion at higher levels. 

Listed below are the six types of auditory fusion shown in Figure 1 along 
with a brief description of the phenomenon Involved in each. Table 1 summarizes 
the relative sensitivities of five fusions to the nonllngulstlc parameters of 
time, pitch, and intensity. For a more complete discussion of each fusion and 
for comparisons among them see Cutting (1972). 



TABLE 1: Nonllngulstlc dimensions which are relevant for the separation 
of lower-level and higher-level fusions. Tolerances of stim- 
ulus variation are listed within each cell. Specific numbers 
reflect current knowledge. 





Timing 


Pitch 


Intensity 


LOWER-LEVEL FUSIONS 








1. Sound Localization 

2. Spectral Fusion 

3. Psychoacoustlc Fusion 


<2 msec 

<5 msec 
* 


<2 Hz 
<2 Hz 
<2 Hz 


<2 db 
* 
* 


HIGHER-LEVEL FUSIONS 








4. Phonetic Feature 

Fusion 

5. Chirp Fusion 


25 msec 
25 msec 


>20 Hz 
>20 Hz 


>30 db 
>30 db 



^Systematic data not available. 



1. Sound localization occurs for all audible sounds, speech and nonspeech. 
The first display of Figure 1 shows that when /da/ is presented to both ears at 
the same time, pitch, and intensity, the subject perceives one /da/ localized at 
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Figure 1: Six fusions of /da/. Schematic spectrograms of speech 
and speech-like stimuli In six types of auditory fusion. 



the midline. Timing differences of 2 msec and pitch differences of 2 Hz are 
sufficient to cause the fused percept to disintegrate Into two elements. Inten- 
sity differences of 2 db are sufficient to change the fused percept. 

2. Spectral fusion occurs for speech sounds and for complex nonspeech 
sounds. For example, when the first formant (Fl) of /da/ Is presented to one 
ear and the second formant (F2) to the other, subjects perceive the fused /da/ 
as If It had undergone no special presentation technique. Timing differences of 
5 msec and pitch differences of 2 Hz are sufficient to disrupt fusion so that 
two Items are heard. 

3. Psychoacoustlc fusion probably occurs for both speech and nonspeech 
sounds, but only speech stimuli have been considered In an experimental situation* 
For example, when /ba/ Is presented to one ear and /ga/ to the other, subjects 
often report hearing the fusion /da/. Such a fusion can only be accounted for 

by averaging the second formant (F2) transitions of /b/ and /g/. Pitch Is the 
only dimension which has been explored experimentally: differences of 2 Hz are 
sufficient to Inhibit fusion. 

4. Phonetic feature fusion occurs only for competing speech segments. When 
/ba/ and /ta/, for example, are presented to opposite ears, subjects report hear- 
ing a "blend" of the phonetic features In the stimuli: /da/ or /pa/. Such 
responses Involve extracting the voicing feature from one stop consonant and com- 
bining It with the place feature of the other stop. These responses cannot be 
accounted for by acoustic averaging. Timing differences of 25 msec do not dis- 
rupt this form of fusion, although greater differences decrease fusion rate. 
Pitch differences as large as 20 Hz and Intensity differences of 30 db appear to 
have little effect on fusion rate. 

.5. Chirp fusion Is demonstrated In the fifth display of Figure 1. If the 
second formant transition is separated from /da/, it sounds like a pitch sweep, 
rather similar to a bird's twitter; hence the name "chirp." The remainder of 
the stimulus, the "chirpless" /da/ sounds rather ambiguous, and resembles /ba/ 
more than /da/. When the chirp stimulus is presented to one ear and th«» "chirp- 
less" /da/ to the other, the subject often reports hearing a complete /da/ plus 
a nonspeech chirp. Hence the chirp is perceived in two forms at the same time. 
Pilot work suggests that timing differences of 25 msec do not inhibit fusion, 
although greater relative onset times reduce fusion rate. Pitch differences of 
20 Hz and intensity differences of 30 db have little or no effect on fusion rate. 

6. Phonological fusion occurs for pairs of phonemes which can form per- 
missible clusters; for example, when /da/ is presented to one ear and /ra/ to 
the other, the subject often perceives /dra/. Previous studies have shown that 
phonological fusion is tolerant to lead-^time differences of as much as 150 msec 
(Day, in preparation-b) . Experiment I was designed to test fusion at longer 
lead-time differences. The effects of pitch and intensity variations beti^en 
the stimuli had not be^n measured. Experiments II and III were designed to 
obtain fhis information and to aid in the classification of phonological fusion 
as a higher- or lower-level fusion. 
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GENERAL METHODOLOGY 



Terms 

1. Stop stimulus . The member of a fusible pair which begins with a stop 
consonant » for example, JPAY. 

2. Liquid stimulus . The member of a fusible pair which begins with a 
liquid, for example, RAY and LAY. 

3. Fusion response . A combination of phonemes from each ear Into a fused . ji 
cluster, for example, PLAY. "V 

4. Lead time . The temporal Interval between the onsets of the stimuli In 
a dlchotlc pair. Relative onset time and stimulus onset asynchrony are terms 
synonymous with lead time. 

5. Dlchotlc presentation . The presentation of two different stimuli, one 
to each ear. 

6. Dlotlc presentation . The presentation of the same stimulus to both ears 
at the same time such that the subject perceives one stimulus localized at the 
midline. Often this type of presentation has been called blnaur^^ l; however 
binaural prese^itation Is a general term denoting the stimulation ot both ears at 
the same time. Thus, in a strict sense, both dlchotlc and dlotlc presentations 
are examples of binaural stimulation (Llckllder, 1951:1027). 

Con veiu ions 

Cipltal letters are used to indicate both stimuli and responses. For 
example, PAY and LAY often yielded the fused response PLAY. A slash between two 

stimuli indicates a dlchotlc pair (PAY/LAY), and an arrow ( ►) should be read 

as "yields;" thus, PAY/LAY ►PLAY. Phonemes are Indicated in lower case letters 

between a pair of slashes, such a^ /r/ and /I/. 

Method 

S timuli and tapes . Synthetic stimuli used In all experiments were generated 
on the Haskins Laboratories* parallel resonance synthesizer. They were first 
synthesized by rule and then modified using the computer-controlled EXECUTIVE 
system (Mattlngly, 1968) so that all parameters of the synthetic speech more 
near.Ty resembled those found in natural speech. Natural speech parameters were 
gathered from a number of sources including spectrograms, oscillograms, and 
published sources of parameter values (Llsker, 1957; O'Connor, Gerstman, Llberman, 
Delattre, and Cooper, 1957; Llberman, Ingemann, Llsker, Delattre, and Cooper, 
1959; and Lehlste, 1962). Specific aspects of the acoustic structure of the 
stimuli are explained in Appendix A and in Cutting and Day (1972). 

Synthetic stimuli were transferred to the pulse code modulation (PCM) 
system (Cooper and Mattlngly, 1969) where t*\ey were digitized and stored on disc 
file for the preparation of experimental tapes. The PCM system allows the exper- 
imenter to record two stimuli simultaneously on the two channels of an audio 
tape, and to specify various relative onset values for the stimuli in each 
dlchotlc pair. Most tlJchotic tapes used three relative onset times: the stop 
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stimulus (e.g., PAY) began 50 msec before the liquid stimulus (e.g., LAY), the 
two stimuli began simultaneously, or the liquid stimulus began 30 msec before 
the stop stimulus.^ The lead times occurred with equal probability and in a 
random order. The accuracy of relative onset tiroes was within .5 msec. Channel 
arrangements for stimuli in fusible pairs were always counterbalanced within a 
dichotic tape; for example, on half the trials PAY was recorded on channel A and 
LAY was recorded on channel B, while on the other half of the trials the reverse 
configuration was recorded. 

Diotic tapes were prepared for identification tasks in order to assess the 
extent to which each stimulus could be identified in isolation. 

Subjects and apparatus . One hundred twelve Yale University undergraduates 
participated in nine experiments. Each received course credit for his or her 
services. The subjects were all right-handed native American English speakers 
with no history of hearing difficulty. They listened, generally in groups of 
four, to tapes played on an Ampex AG-500 dual track tape recorder. Auditory 
signals were sent through a listening station to Grason-Stadler earphones 
(Model TDH39-300Z). Gains on the tape recorder and listening station were 
adjusted so that stimuli were presented at approximately 70 db sound pressure 
level. 2 Earphone assignments were counterbalanced within subjects when possible, 
and across subjects when experimental tapes were too lengthy to permit the 
with in- subject control. 

Procedure. In most experiments, subjects participated in two tasks: a 
dichotic fusion task and a diotic identification task. In all cases the sub- 
jects' first task was the fusion task. The experimenter read them the following 
standard instructions for phonological fusion tasks as they read silently from 
their own copies: 

This is an experiment in speech perception. You will be listen- 
ing to a series of messages through earphones. After each pre- 
sentation, you are to write down what you heard. You will have 
to respond immediately for there will only be a few seconds 
before the next presentation begins. 

In order to do a good job, you must report exactly what you 
heard. For example » if you heard a real word, write down that 
real word; if you heard a nonsense word, write that nonsense 
word; if you heard one word, write it; if you heard two words, 
write them both; and so on. If you are not sure about what you 
heard, make a guess anyway: you must write something down after 
every presentation. 

Some of the items may sound very similar to ethers; however, they 
may in fact be different. Therefore, be careful to judge each 
presentation on its own merits. 



Experiment I used other lead times as well, and Experiment IV used only the 
simultaneous presentation. 

I - - « 

The only exception was Experiment III where intensity level was experimentally 
varied. 
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Before the task began subjects listened to several practice pairs and wrote their 
responses In order to familiarize themselves with the task and the stimuli. 
After listening to the practice pairs subjects were typically curious about the 
stimuli. They were reassured that the trials sometimes sounded odd to subjects, 
but that they should perform the task as best they could. If questions arose 
after listening to practice Items subjects were referred back to the Instruction 
sheet. They were not told that different stimuli were presented to different 
ears until after the completion of the fusion task. 

In most experiments a second task was also run: diotic identification of 
individual stimuli. Single items such as PAY, RAY, and LAY were presented in a 
random sequence and subjects wrote down what they heard. The results of the 
identification tasks were consistent across all experiments: the individual 
stimuli were highly identifiable. For the sake of flow in the discussion of the 
fusion experiments, the results of the diotic identification tasks are summarized 
in Appendix B.3 Identification tasks always followed fusion tasks so that sub- 
jects were not given precise knowledge about the stimuli before the fusion task 
began. 

The statistical significance of a result was determined by a sign test on 
the individual subjects' scores. The z^ scores and £ values are given only when 
results were significant at the .05 level or less. 

I. NQNLINGUISTIC DIMENSIONS 

Experiment I: Timing 

Timing appears to be an unimportant factor in phonological fusion. Using 
disyllabic natural speech stimuli Day and Cutting (1970) and Day (in prepara- 
tion-b) found that phonological fusion wau remarkably insensitive to differences 
in relative onset times of as much as 150 msec. Their results showed that 
fusion occurred almost as readily when the liquid stimulus (e.g., LANKET) led the 
stop stimulus (BANKET) by 150 msec, as when the stop stimulus led the liquid by 
the same extent. Furthermore, fusion rates for both cases were nearly Identical 
with that for the simultaneous onset case. The longest relative onset time 
studied to date is 150 msec. The present study examined even longer lead times 
in order to determine the Interval at which fusion rate drops substantially. 

Method . Two fusion sets of the same general pattern were selected: the 
PAY set (PAY, RAY, LAY) and the KICK set (KICK, RICK, LICK). Members of the PAY 
set were 350 msec in duration, and members of the KICK set were 325 n?sec in 
duration. Dlchotlc pairs were assembled for all combinations of fusible items 
within a set: pairs and possible fusions were PAY/RAY — ►PRAY, PAY/LAY — ►PLAY, 
KICK/RICK — K:RICK, and KICK/LICK— -HILICK. Eleven lead times were selected: 0, 
+ 50, + 100, + 200, + 400, + 800. Plus and minus signs refer to relative onset 
times for the same stimuli: the plus refers to pairs in which the stop stimulus 
led the liquid, while the minus refers to liquid-leading pairs. There were 
equal numbers of stop-leading and liquid-leading trials. Since the longest 
stimuli (the PAY set) were only 350 msec in duration, the 400 and 800 msec con^ 
ditions involved temporally non-overlapping stimuli for both sets. All fusible 



The results of the identification task in Experiment VII are discussed in the 
text. No identification tasks were run in Experiments IV and IX. 



pairs were assembled into two Independent tapes, each with a different random 
order. Each tape consisted of 88 pairs: (2 sets of stimuli) x (2 stop/liquid 
pairs per set) x (11 lead times) x (2 channel arrangements per pair). The order 
of listening to tapes was counterbalanced across eight subjects. 

Major results * As shown in Figure 2, fusion occurred most readily when the 
stop stimulus led the liquid by SO and 100 msec, where fusion rates were 63 and 
59 percent, respectively. Fusion rate dropped substantially at the - 200, + 400, 
and + 800 msec leads, where fusion averaged 10 percent. Intermediate fusion 
rates were observed at the 0, - 50, - 100, and + 200 msec leads. No subject 
deviated markedly from the group data. 

Fusion rates at the short leads (0, + 50) were comparable to those found in 
previous studies using the same stimuli (Cutting, 1973a, b), and to those found 
in other experiments in this paper. Hence fusion rate was stable for short-lead 
items even when they appeared in a sequence with long-lead items that rarely 
fused. The fusions that did occur at long leads occurred primarily at the 
beginning of the task, and rapidly diminished thereafter. Fusion for shorter 
leads, however, continued at the same high rate throughout the task. 

The formant transitions in the stop stimuli were about 50 msec in duration, 
while those in the liquid stimuli were 150 msec in duration. These segments 
did not need to overlap in time in order for fusion to occur, since fusion rate 
was substantial for the -f 100 msec lead case. 

Other results . In previous studies using the same stimuli at: short leadr 
of 0 and + 50 msec, subjects usually reported a single item when they did not 
fuse; for example, PAY (Cutting, 1973a, b). The liquid stimulus was rarely 
reported in such cases. One-item responses, including fusions (PIAY) and non- 
fusions (PAY), accounted for more than 88 percent of all responses. In the 
present study, however, a wide variety of responses occurred: three kinds of 
one-item responses (PLAY, PAY, or occasionally LAY), and a large percentage of 
two-item responses (PAY and LAY, PLAY and PAY, PLAY and LAY, or occasionally 
PLAY and PLAY). One-item responses occurred predominantly at short leads while 
two-item responses occurred at the long leads. Figure 3 compares the one-item 
responses at rach lead with the fusion responses shown in Figure 2. The one- 
item response curve included both fusions and nonfusions, and was generally 
symmetrical: an equal number of one-item responses occurred when the stop 
stimulus led as when the liquid stimulus led. The fusion response curve was not 
symmetrical, since considerably more fusions occurred when the stop stimulus 
began first. 

The effects of relative onset time on phonological fusion may be sumruarized 
by two principles. First, fusions occurred frequently when the stimuli had 
relative onsets of + 100 msec or less, but infrequently when the relative onsets 
were + 200 or more. The mean value between these relative onset values is 
150 msec, a lead time which may well be the maximum relative onset at which 
fusion will occur for these stimuli. High rates of fusion do occur at + 150 msec 
for disyllabic natural speech stimuli (Day and Cutting, 1970; Day^ in prepara- 
tion-b). Second, fusions occurred more readily when the stop stimulus led the 
liquid than in the reverse configuration. The extient to which the first prin- 
ciple governs fusion rate is probably a function of the syllable structure and 
duration of the stimuli. In fact there may be a direct relationship between 
stimulus duration and the maximum relative onset time at which fusions occur. In 
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the present study all stimuli were monosyllabic words, while previous studies 
(Day, 1968, 1970a, In preparatlon-b) used more complex disyllabic stimuli such 
as BANKET/LANKET* Longer stimuli may be less sensitive to lead time differences^ 
In the range between 100 and 200 msec. The difference between relatively simple 
and more complex stimuli may also have implications for the second principle. 
Day (1970a) has found that fusion rate for disyllabic pairs Is nearly Identical 
for 11 quid- leading pairs and stop-leading pairs, while the present study found 
differential fusion rates for these pairs. 

Fusion rates for stop + /r/ and stop + /I/ stimuli were nearly Identical. 
However, there was a disproportionately large number of stop + /I/ responses. 
For example, PAY/LAY yielded PLAY, while PAY/RAY also yielded PI AY on a large 
number of trials. In fact, /I/ was substituted for /r/ In the fusion response 
on nearly half of all trials In which stop + hi stimuli were fused. The reverse 
substitution was Infrequent. Day (1968), Cutting and Day (1972), and Cutting 
(1973a, b) have reported this phenomenon and It Is considered In more detail In 
later experiments when linguistic dimensions of the stimuli are varied. Fusion 
rates for the KICK set and the PAY set In the present study were comparable. 

Overview . Timing Is not a crucial factor In phonological fusion, since 
fusion continued to occur to a considerable extent at long lead times. Cutting 
(1972) has observed that no other auditory fusion occurs with lead times of 
greater than + 25 msec. The observed Insensltlvlty of phonological fusion to 
timing differences is congruent with the notion that it is a higher-level pro- 
cess. 

Experiments II and III: Pitch and Intensity 

Insensltlvlty to parameters of pitch and Intensity would lend further 
support to the classification of phonological fusion with the higher-level 
fusions. 

Method . The same fusion sets as In Experiment I were used: the PAY set 
(PAY, RAY, LAY) and the KICK set (KICK, RICK, LICK). Two versions of each 
stimulus were synthesized for Experiment II, one at a relatively high fundamental 
frequency or pitch, and one at a relatively low pitch. All stimuli had a falling 
pitch contour. High-pitch stimuli began at 140 Hz and fell to a value of 120 Hz, 
while low^pitch stimuli began at 120 Hz and fell to a value of 100 Hz. The 20 Hz 
difference in pitch in this frequency range is equivalent to a difference of 
three notes on a musical scale. Thus, as shown at the top of Figure 4, the 
stimuli PAY and LAY were labeled: PAY-high (for "high" pitch), PAY-Low, LAY- 
high, and LAY-low. 

Two versions of each stimulus were also synthesized for Experiment III: 
one at a relatively high intensity (70 db SPL) and one at a relatively low 
intensity (55 db SPL). Both sets had the same low pitch used in Experiment II. 
The difference between the two intensities was 15 db, a difference which made 
the high- intensity stimuli perceptually 32 times more powerful than the low^ 
intensity stimuli. The top of Figure 4 can again be used to display the stimuli; 
instead of "high" and "low" referring to pitch level, these terms refer to inten- 
sity level. Twelve subjects served in Experiment II and 12 different subjects 
in Experiment III. 
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"SAME" PAIRS 



PAY + 1 AY 

•^"'high high 



"DIFFERENT" PAIRS 

PAY + L AY 
■^"'hIGH LOW 

'''^\0W '^'-^^HIGH 

Figure 4: Pairings of stop and liquid stimuli for the fusion task. 



Dichotic pairs were assembled from possible combinations of fusible items 
within each set. As shown at the bottom of Figure 4, two pairs shared the same 
value of the target dimension: for example, PAY-high/LAY-hlgh and PAY-low/LAY- 
low. The other two pairs had different values: PAY- high /LAY- low and PAY-low/ 
LAY-high. Pairs that shared the same pitch or intensity ("same" pairs) and 
pairs that differed in. pitch or intensity ("different" pairs) were presented in 
random order on the same tapes. Two tapes with different random orders were 
prepared for Experiment II, and two for Experiment III. Each tape contained 96 
items: (2 sets of stimuli) x (2 stop/liquid pairs per set) x (4 pitch or inten- 
sity combinations) x (3 lead times) x (2 channel arrangements per pair). In 
order to measure differences in fusion rate as a function of the variation in 
pitch or intensity, a high overall fusion rate must be maintained for the "same" 
pairs. Expei iment I showed that the 0 and + msec leads yielded substantial 
fusion rate?, and hence these leads were used here. Within each experiment sub- 
jects listevirjd to both stimulus tapes, with a brief rest between them. 

Major results . Fusion occurred readily for all stimulus pairs. In 
Experiment II fusion rates were identical for pairs with the same pitch and for 
pairs with different pitches — 50 percent each, as shown in Figure 5. In fact, 
fusion rates were within a few percentage points for all four combinations of 
pitch valjes. 

The results of Experiment III were similar. Pairs with the same intensity 
and those with different intensities all fused at a rate of 36 percent, as shown 
in Figure 6. Again, there were no significant differences in fusion rate among 
tha four combinations of intensity values. 

Otner results , a.) Fusion rates were nearly identical for stop + /r/ and 
stop + /I/ stimuli, b) The /I/ substitutions were again frequent in both 
studies. When PAY/RAY fused, for example, PLAY responses occurred nearly 80 
percent of the time in Experiment II and nearly 60 percent of the time in 
Experiment III. As reported earlier, /r/ substitutions were infrequent. 
c) fusion rates for the PAY set and the KICK set were comparable within each 
stuiy. d^) Overall fusion rate for "same" pairs in Experiment III was lower than 
those in Experiment II, although this difference was not significant. A possi- 
ble explanation for this difference in fusion rate may be that many of the 
Experiment III subjects were "low fusers." Elsewhere it has been shown that 
there is a bimodal distribution of subjects according to their fusion rates: 
some subjects fuse most of the time, while others rarely fuse (Day, 1970a). 
Individual differences in the present experiments are considered in the section 
on additional findings, e) The effect of relative onset time was the same in 
Experiments II and III as it was for the 0 and + 50 msec leads in Experiment I. 
That is, fusions were more frequent at the -f 50 msec lead than at 0 or - 50 msec. 
This same pattern of results occurred in all other studies in this series, and 
therefore will not be discussed again. 

Overview. Within the range of values explored pitch and intensity were not 
important stimulus dimensions for phonological fusion. These are findings con- 
sistent with the notion that phonological fusion is a higher-level phonomenon. 

Discussion: Nonlinguistic Experiments (I-III) 

Phonological fusion is strikingly independent of various nonlinguistic 
characteristics of the to-be-fused stimuli. Time, pitch, and intensity differ- 
ences were explored here. In recent studies, other nonlinguistic differences 
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were studied. Cutting (1973f) used the same paradigm shown In Figure 4 for 
another dimension, the apparent vocal tract size from which the fusible stimuli 
were spoken. Stimuli were synthesized to represent a relatively large vocal 
tract of a normal adult male (as used in all studies in this series) and the 
vocal tract of a midget or a small child* This manipulation again had no effect 
on fusion rate. \ 

As a final demonstration of the independence of phonological fusion from 
nonlinguistlc bonds, another study is relevant here. In previous studies only 
one parameter was varied at a time: for example, pitch, intensity, or vocal 
tract size. Cutting and Day (in preparation) varied all three dimensions: 
pitch, intensity, and vocal tract size. Results showed that even when all 
three dimensions of the stimuli were different for the two members of a pair 
fusion rate was unaffected. For example, PAY-low (pltch)-low (Intensity)-large 
(vocal tract) fused as readily with LAY-hlgh-high-small as it did with LaY-Iow^ 
low- large. 

Natural speech fusible items differ in nonlinguistlc dimensions despite the 
care taken in uttering them. Cutting (1973a) compared the fusion rate for 
natural speech pairs with that of their more accurately controlled synthetic 
counterparts, and found that fusion rate was higher for synthetic items. The 
results of Experiment I-III have shown that relative onset time, pitch, and 
intensity do not have much influence on fusion rate. Perhaps the difference be- 
tween the fusion rates of synthetic and natural speech pairs involves stimulus 
duration. The natural speech items for a given set often differed by as much as 
50 msec in duration, while their synthetic counterpart's were made to be equal. 
The effect of stimulus duration on phonological fusion has not yet been examined. 

Six fusions reconsidered . Cutting (1972) described six different types of 
auditory fusion. The relative sensitivities of five fusions to time, pitch, and 
intensity variation in the to-be-fused stimuli were listed in Table 1. Table 2 
adds phonological fusion to the higher-level fusions. Like phonetic feature 



TABLE 2: An expanded version of Table 1 including phonological fusion 
among the other fusions. 





Timing 


Pitch 


Intensity 


LOWER-LEVEL FUSIONS 








1. Sound Localization 


<2 msec 


<2 Hz 


<2 db 


2. Spectral Fusion 


<5 msec 


<2 Hz 


* 


3. Psychoacoustic Fusion 


* 


<2 Hz 


* 


HIGHER-LEVEL FUSIONS 








4. Phonetic Feature 








Fusion 


25 msec 


>20 Hz 


>30 db 


5. Chirp Fusion 


25 msec 


>20 Hz 


>30 db 


6. Phonological Fusion 


150 msec 


>20 Hz 


>15 db 



^Systematic data not available. 
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fusion and chirp fusion, phonological fusion is relatively insensitive to large 
stimulus differences in the three nonlingulstlc dimensions. The lead time value 
of 150 msec was selected for phonological fusion, in part, as a mean value be- 
tween 100 msec, where fusion occurred readily for the present stimuli, and 200 
msec, where fusion occurred rarely, and in part because Day (in preparation-b) 
and Day and Cutting (1970) found that fusion of disyllabic stimuli occurred 
readily at 150 msec leads. The pitch value of 20 Hz matches the experimental 
values tested for other higher-level fusions, and preliminary work suggests that 
all higher-level fusions are tolerant of much greater pitch differences. The 
intensity value of 15 db reflects current knowledge since this is the largest 
interva] studied to date. 

The maxinniiD relative onset value at which fusion occurs varies even within 
the group of higher-level fusions. Perhaps this differential sensitivity to 
timing can be explained in terms of the units which are fused. In both phonetic 
feature fusion and chirp fusion the units are phonetic: features of different 
phonemes are combined into a single new phoneme. In phonological fusion, how- 
ever, the units are the phonemes themselves: phonemes from different stimuli 
are combined into a cluster. Since phonemes are made up of phonetic features, 
they are necessarily larger units than features. It follows that, since the 
units which are fused in phonological fusion are larger, the maximum relative 
onset value at which fusion occurs should also larger. Indeed, phonetic fea- 
ture fusion and chirp fusion begin to disintegrate with stimulus onset time dif- 
ferences of 25 msec, while phonological fusion rates remain high at differences 
of 100 or 150 msec. 

Higher and lower levels reconsidered . From the data presented in Table 2, 
a process model is proposed describing the differences between higher- and lower- 
level fusions, as shown in Figure 7. For the sake of simplicity, the perceptual 
system was divided into two parts: a higher-level processor and a lower-level 
processor. Two dichotic stimuli, Stimulus A and Stimulus B, necessarily enter 
the system by way of the lower-level processor and are sent upwards in the sys- 
tem. Two experimental situations have been considered: one in which there is 
no variation in nonlinguistic stimulus dimensions (cases 1 and 2), and one in 
which such variation does occur (cases 3 and 4), In the no-variation conditions, 
higher- and lower-level fusions cannot be distinguished, since fusion occurs 
readily for both. This situation is represented for cases 1 and 2 by the solid 
lines and arrows. In the variation condition differences occur: higher-level 
fusions are not disrupted by nonlinguistic stimulus variation, \rfiile lower-level 
fusions are disrupted. The information in the stimuli must therefore be ex- 
tracted at different levels for the two types of fusion. In higher-level fu- 
sions, the stimuli are combined into a single percept in the higher-level pro- 
cessor (case 3), and since nonlinguistic stimulus variation has little effect on 
fusion rate higher-level fusions may take place exclusively in the higher-level 
processor (hence the dashed lines for case 1). Lower-level fusions, on the 
other hand, occur only in the lower-level processor since fusion is disrupted 
when nonlinguistic variation occurs (hence the broken arrows in case 4). 

Higher-level fusions are influenced by linguistic variables and lower-level 
fusions are influenced by nonlinguistic variables. One corollary to this state- 
ment is that higher-level fusions occur bnly for speech stimuli, whereas lower- 
level fusions occur for both speech and nonspeech. A second corollary is that 
the higher-level processor shown in Figure 7 is basically a speech, or language, 
processor, whereas the lower-level processor is concerned with auditory aspects 




of the signal. The two processors, then, are necessarily hierarchical: a signal 
which has linguistic dimensions must also have nonllngulstlc dimensions , such as 
pitch and Intensity. A signal which has nonllngulstlc dimensions, on the other 
hand, need not necessarily have linguistic dimensions. 

Since phonological fusion Is governed by higher-level rules, the rules of 
language, the remaining experiments were designed to study the specific linguis- 
tic levels at which fusion takes place. 

II. LINGUISTIC DIMENSIONS 

Three linguistic levels were considered In the remaining experiments: the 
level of semantics, the level of the phoneuie, and the level of acoustic struc- 
ture as It pertains to language. The phoneme level .alght appear to be the pri- 
mary linguistic level at which phonological fusion occurs, since it is phonemes 
which are fused into clusters. However, this need not be the case. Therefore 
the two other levels were chosen to be higher (semantics) and lower (acoustics) 
than the phoneme level. The experiments begin at the semantic level and move 
"downward" to the phoneme and acoustic levels. 

Semantic Level 

Day (1968) found that semantic cues at the word level influenced fusion 
rate. Fusfon rates were higher when the fused outcomes were real words than 
when they were nonwords (PAHDUCT/RAHDUCT — ►PRODUCT vs. PAHLOW/RAHLOW — ►PRAHLOW) . 
Nonword fusions did occur (GORIGIN/LORIGIN — KJLORIGIN) although at a reduced 
rate. 

Experiment IV: Sentence Context 

The present experiment was designed to observe the effects of semantic cues 
at the sentence level on fusion rate. Since Experiments I-III found that 111 
was frequently substituted for /r/ in fusion responses, the present study was 
also designed to observe the effect of semantic context on 111 substitutions. 

Method . Two sets of stimuli were used, the PAY set (PAY, RAY, LAY) and the 
GO set (GO, ROW, LOW). Fusible pairs were presented in isolation, as in previ- 
ous experiments, and imbedded in sentence contexts. The PAY set appeared in the 

contexts THE TRUMPETER ^S FOR US and THE MINISTER ^S FOR US, while the GO 

set appeared in THE COALS ARE ING AGAIN and THE TREES ARE ING AGAIN. 

These sentences were made into die ho tic pairs such that THE TRUMPETER PAYS FOR 
US, for example, was presented to one ear and THE TRUMPETER LAYS FOR US to the 
other. Fusible pairs appeared in both semantically appropriate and inappropriate 
contexts. Appropriateness was determined in a rating experiment with separate 
subjects where pairs of sentences such as THE TRUMPETER PLAYS FOR US and THE 
TRUMPETER PRAYS FOR US were presented in written form and subjects were asked to 
Judge which was most "meaningful." The "meaningful" ratings were taken as a 
measure of semantic appropriateness, and the most appropriate member of each 
pair is shown in a box in Figure 8. Additional details are given in Appendix C. 

Two tapes were prepared, one with fusible targets imbedded in sentences and 
the other with the target pairs presented in isolation. The sentence tape con- 
sisted of 64 items: (4 sentence frames) x (2 stop/liquid pairs per set) x (2 
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THE minister: ^for us 



^ PAYS * 



RAYS 



PAYS ^ LAYSl^ 



THE TRUMPETER^ ^FOR US 

^PAYS + RAYS^ 



GOING + LOWING 
THE TREES ARE^^ __^AGAIN 



GOING •^ ROWING 



GOING t- LOWING 



THE COALS ARE^ ^AGAIN 

^GOING + ROWING'''^ 



pairs that yield 
= semantically 
appropriate fusions 



Figure 8: Fusible pairs imbedded in sentence frames. 
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channel arrangements) x (4 obsPTvations per sentence). Dichotic sentence pairs 
were presented at a simultaneous onset with 12 seconds between trials. Subjects 
wrote down the entire sentence. The no-sentence tape also had 64 pairs: (2 
sets of stimuli) x (2 stop/liquid pairs per set) x (2 channel arrangements) x 
(8 observations per pair). Again, only the simultaneous onset time was used, but 
the inter trial interval was four seconds. Subjects wrote down 'what they 
heard,' one word or two words, acceptable words or nonsense. Half of the 16 
subjects listened first to the sentence tape and then to the no-sentence tape, 
while the others listened in reverse order. 

Major results . Fusion rate was significantly higher for the sentence con- 
dition than tor the no-sentence condition, as shown in Figure 9. All subjects 
showed this trend ( = 3.8, £<.001). Fusion rate was 85 percent for sentence 
trials and it was 65 percent for no-sentence trials, a rate comparable to that 
found in previous studies. 

Other results , a) The order in which subjects listened to the sentence and 
no-sentence tapes was not a significant factor, h) Fusion rates were comparable 
for stop + /r/ and stop + /I/ items, as well as, c) for both sets of stimuli in 
each condition. 

Figure 10 shows the percent responses in each sentence frame. Stop + /I/ 
responses dominated all sentence contexts even when they were semantically in- 
appropriate. For sentence frame 1, THE MINISTER PLAYS FOR US occurred on 74 
percent of all trials. Certainly, the minister PLAYING is semantically less 
likely than PRAYING even in today's society of changing roles. Likewise, in 
sentence frame 3, THE TREES ARE GLOWING AGAIN occurred on 83 percent of all 
trials, despite the fact that it was not judged to be very appropriate. Sentence 
frames 2 and 4 yielded a more predictable set of results: in both cases stop + 
/I/ fusions were judged semantically appropriate and these fusions vere very 
frequent. "Other** responses for all four sentence frames were primarily re- 
sponses in which only the stop stimulus was reported; for example, THE MINISTER 
PAYS FOR US. Pairs of stimuli in the no-sentence condition yielded similar 
liquid substitution results: for example, when PAY /RAY fused, PLAY responses 
were given 85 percent of the time. The reverse substitution rarely occurred. 

The present experiment showed that meaning at the sentence level could not 
account for the /l/-substitution effect. Relative frequency of occurrence of 
the fused words cannot account for them either: GLOWING, for example, is much 
less frequent than GROWING (Thorndike and Lorge, 1944; Carroll, Davies, and 
Richman, 1971). The relative frequency of these clusters in English fared 
little better as a predictor of the particular fusion response. In fact, stop 
+ /r/ clusters occur almost twice as frequently as stop + /I/ (Day, 1968). 
Meaning at the word level may provide a clue to /I/ substitutions. Day (1968) 
found that, while subjects usually reported hearing GROCERY when given GOCERY/ 
ROCERY, sometimes they reported hearing GLOCERY, a nonword. In the present 
series of experiments, both the stop + /r/ and stop + /I/ fusions for a given 
set were acceptable English words. Day (1968, 1970a), on the other hand, chose 
stimuli that could fuse meaningfully in only one direction. She found that 
PAHDUCT/RAHDUCT — ►PRODUCT and not PLODUCT, and that GEEDY/REEDY — KJREEDY not 
GLEEDY. Such results suggest that meaning at the word level can override the 
/l/-substitution effect. This effect is considered again in Experiment VI and 
in the section on additional findings. 
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Figure 10: Percent responses for all four sentence frames. 



Overview. Fusion rate was Increased by higher-level, semantic cues which 
were outside the fusible stimuli themselves. However, sentence context cannot 
fully account for fusion since fusion rates continued to be high In the no- 
sentence condition. 

Semantic context did not affect the type of fusion responses which subjects 
reported. Stop -i* /I/ fusions occurred even when the stop + /r/ responses would 
have been semantlcally more appropriate. However, the results of other experi- 
ments suggest that semantic cues at the word level can override the /l/-substl- 
tutlon effect. 

Phoneme Level 

Phonological fusion occurs when phonemes from different dlchotlc stimuli 
are perceived as a cluster. Experiments V and VI examined the phonemic compo- 
nents, the stop and the liquid, to assess their Importance In the fusion phenom- 
enon. 

Experiment V; Stops and Fricatives 

In Experiments I- IV stop consonants served as the first phoneme of the to- 
be-fused consonant-consor ant-vowel cluster. The present study compared the fus- 
ibility of stop/llquld and f rlcatlve/llquld pairs. Both Initial clusters occur 
very frequently In English, but that fact does not necessarily imply that they 
fuse equally well. 

Method . In addition to the BED set (BED, RED, LED) and the GO set (GO, 
ROW, LOW) the fricative stimuli FED and FOE were synthesized.^ The fricative 
stimuli were Identical to the stop stimuli In duration, pitch, and Intensity, 
and differed only In the acoustic structure of the first 100 msec as shown In 
Figure 11. Appropriate frlcatlon for the phoneme HI was substituted for the 
formant transitions and Initial vowel segments of BED and GO. A given liquid 
stimulus such as LED was paired with both a stop (BED/LED) and a fricative 
( FED/ LED) . All stimuli and possible fusions were English words or names: BED/ 

RED >«READ, BED/LED— *-BLED, FED/RED — ►FRED, FED/LED — ►FLED, GO/ROW — KJROW, GO/ 

LOW — KJLOW, FOE/ROW — ►FRO, FOE/LOW — ^LOW. 

One tape was prepared with stop/llquld pairs and another tape with f rlca- 
tlve/llquld pairs. Each tape consisted of 120 dlchotlc trials: (2 sets of stim- 
uli) X (2 consonant/liquid pairs per set) x (3 lead times) x (2 channel arrange- 
ments) X (5 observations per pair). Twelve subjects listened to both tapes: 
half listened first to the f rlcatlve/llquld stimuli and then to the stop/llquld 
stimuli, while the others listened In the reverse order. 

Major results . Fusion occurred much more readily for the stop/llquld pair 
than the f rlcatlve/llquld pairs. Fusion rates were 57 percent and 18 percent 
respectively, as shown In Figure 12. This 3:1 ratio was highly significant, 
with all subjects showing greater fusion rates for stop/llquld Items (£ * 3.18, 



The fricative /f /. was chosen because It Is the only fricative In English that 
clusters with both hi and /!/ In Initial position. 
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Figure 12: Results of the fusion task for fricative/ liquid and 
stop/liquid pairs. 



£<«001). Fusion rate differences cannot be accounted for by the relative fre- 
quency of the possible fusion responses In English: for example, BLED and GLOW 
are much less common than FLED and FLOW (Carroll, Davles, and Rlchman, 1971). 
Furthermore, Initial /f/ + liquid clusters occur at about the same frequency r z 
Initial /b/ + liquid clusters In English, and considerably more frequently than 
Initial /g/ + liquid clusters (Hultz^n, Allen, and Mlron, 1964; Denes, 1965). 

Perhaps differences In fusion rates for the two classes of consonants can 
be accounted for on other grounds, such as relative encodedness. Llberman, 
Cooper, Shankweller, and Studdert-Kennedy (1967) have defined encodedness as the 
general amount of acoustic restructuring that a phoneme undergoes In different 
contexts. Stop consonants are highly encoded, whereas fricatives are only mod- 
erately encoded. Perhaps highly encoded phonemes combine more easily with other 
phonemes to produce a cluster. 

Other results , a) The order In i^lch subjects listened to the stop and 
fricative tapes was not a significant factor, b) Fusion rates for stop + /r/ 
and stop + /!/ stimuli were again within a few percentage points. Fricative -f 
/r/ and fricative + /I/ fusion rates were also comparable. O The /I/ substitu- 
tions were frequent for stop/llquld pairs, but not for f rlcatlve/llquld pairs. 
In fact^ /r/ substitutions occurred on 64 percent of all trials li which frica- 
tive + /I/ stimuli fused, while corresponding /I/ substitutions were rare. No 
explanation Is apparent for this reversal In liquid substitutions for fricative/ 
liquid stimuli. 

A second type of substitution also occurred. Frlcatlve/llquld stimuli did 
not always yield fricative + liquid responses; for examplie, FED/RED— >BRED. In 
fact, about 70 percent of all frlcatlve/llquld pair fusions were actually stop + 
liquid responses. Stop-for-f rlcatlve substitutions were not the result of poor 
fricative stimuli, since subjects Identified them In Isolation on the diotlc 
test with a high degree of accuracy (see Appendix B) . Instead, these substitu- 
tions appear to be an extension of the differences In fusibility between the 
stops and fricatives. 

Overview . When the first consonant In the to-be-fused cluster was a stop, 
fusion rate was high, but when It was a fricative, fusion rate dropped. 

Experiment VI; Liquids and l^emlvowels 

The present study examlred the second consonant In the to-be-fused cluster. 
Stimuli beginning with sesiv-^wels (that Is, /w/ and /y/) were prepared to see 
whether they would fuse as readily as the liquids, and to see whether the /l/- 
substltutlon effect would be extended to /w/ and /y/. 

Method . Two sets of stl^tull were used: the KICK set (KICK, RICK, LICK, 
WICK) and the COO set (COO, RUE, LIEU, YOU). 3 Liquid and semivowel stimuli 



The only stop consonant that clusters with all liquids and semivowels In 
English Is /k/; yet /ky/ occurs only before the vowel /u/, while /kw/ does not 
occur before /i/. Thus, It was necessaiy to synthesize two sets of stimuli: 
one for /r, 1, w/ and the other for /r, 1, y/. Note that LIEU could also be 
represented LOU. 
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within the same set were Identical In all respects except for the direction and 
slope of the second and third fomant (F2 and F3) transitions, as shown In 
Figure 13. These are the cues that distinguish all liquids and semivowels 
(O'Connor et al., 1957; Llsker, 1957). 

Figure 14 shows the stimuli and the possible fusion responses for both 
sets. All were words or names common In English. A tape was prepared with 108 
dlchotlc Items: (2 sets of stimuli) x (3 liquid and semivowel stimuli per set) 
X (3 lead times) x (2 channel arrangements) x (3 observations per pair) . Twelve 
subjects listened to two passes through the tape, reversing headphones after the 
first pass. 

Major results . Fusion rate was comparable for pairs within a particular 
stimulus set as !?^own in Figure 15. KICK/RICK, KICK/LICK, and KICK/WICK pairs 
all fused at an average of 70 percent; while COO/RUE, COO/LIEU, and COO/YOU all 
fused at an average of 42 percent. There were no significant differences within 
each set. 

Other results . Regardless of which stimuli were presented, most responses 
were stop + /I/; /I/ was substituted for /r/, as in previous studies, and it was 
also substituted for /w/ and /y/. CLICK and CLUE responses occurred in 89 
percent of all trials in which fusions occurred. Again, word frequency of the 
possible fusions cannot account for the substitutions. For example, QUICK is 
much more common than CLICK, and CREW Is mon^ common than CLUE (Carroll, Davles, 
and Rlchman, 1971). Nevertheless, the 111 substitutions for both sets of stimu- 
li yielded relatively common English words. The data of Day (1968) suggest that 
when /I/ substitutions do not yield acceptable words, they occur considerably 
less often. 

The KICK set fused more than the COO set. Word frequency cannot account 
for this difference. Other possible causes of differential fusion rates among 
stimulus sets include cluster frequency, phonetic differences in the stop stimu- 
li, and phonetic differences in the vowels (see Cutting, 1973a)* 

Overview . The role of the second consonant in the to-be-fused cluster is 
less clear than that of the first consonant. For the present stimuli, fusion 
occurred equally well for all stop/liquid and stop/semivowel pairs, yet all 
pairs tended to yield a stop + 111 response. 

Acoustic Level 

Linguistic cues at the sentence and phoneme levels have been shown to 
affect fusion rate (Experiments IV-VI). Perhaps linguistic cues at the level 
of acoustic structure are also important. Since the liquid is perceptually 
Interpolated between the stop and the vowel, one key to fusion may lie in its 
acoustic structure. Experiments VII-IX examined various aspects of the acoustic 
structure of liquids. 

Experiment VII: Liquid Transitions 

Experiment VI showed that for the present sets of stimuli, liquids (/r, 1/) 
and semivowels (/w, y/) tended to yield stop + 111 fusions. Since /I/, /w/, and 
lyl have falling F3 transitions while /r/ has a rising F3, and since /r/, /I/, 
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YOU, 



CUE 



Figure 14: Stimuli and possible fusions when the llquld-lnltlal 
stimulus Is varied. 
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and /w/ have rising F2 transitions irtiile lyl has a falling F2 (see Figure 13) , 
it appears that any combination of rising and falling F2 and F3 transitions is 
sufficient for stop + 111 fusions to occur. If any combination is sufficient 
for fusion, perhaps no transitions are needed at all. For example, PAY/AY, a 
pair without any liquid transitions^ might also yield stop + 111 fusions. The 
present study varied the slope of the liquid transitions to determine how much, 
if any, transition was necessary for fusion to occur, and to confirm that 
fusion is not a response bias. 

Method . The PAY set and the KICK set were expanded to include five stimuli, 
one stop stimulus and four stimuli vhich formed a liquid-to-vowel continuum, as 
shown in Figure 16. At one end of the continuum^ Stimulus 1 had full liquid 
transitions in all formants as found in LAY and LICK. At the other end of the 
continuum. Stimulus A had the same duration but began with a steady-state vowel, 
AY and ICK. Between the extremes were Stimuli 2 and 3 which had intermediate 
amounts of formant transitions. Equal increments of acoustic change occurred 
between successive stimuli. 

Sixteen subjects served in two tasks, identification of the liquid-to-vowel 
stimuli in isolation and dichotic fusion. Since the results of the identifica- 
tion task were highly relevant to the fusion task, those data are discussed 
first. 

Task 1 ; Identification of the liquid- to- vowel stimuli 

Tapes and procedure . A diotic Identification tape of 80 randomized liquid- 
to-vowel items was prepared: (2 sets of stimuli) x (4 stimuli per array) x (10 
observations per stimulus). There was a three<*second interval between each 
item. Subjects wrote down the single item that they heard presented on each 
trial. 

Results . Stimuli 1 and 2 were identified as beginning %d.th HI on 88 per- 
cent of all trials, as shown in the top part of Figure 17. Stimuli 3 and 4 were 
identified as beginning with /I/ on only 8 percent of all trials. All subjects 
showed this quantal trend (z ■ 3.8, £<.001). These results demonstrate the 
well-known fact of categorical perception in certain speech sounds (see Liberman, 
1957; Pisoni, 1971). Equal amounts of change along a physical dimension were 
not perceived as equally spaced, but instead were perceived in groups with a 
distinct boundary between Stimulus 2 and Stimulus 3. There was no difference 
between the LAY-to-AY and LICK-to-ICK stimulus arrays. 

Task 2 ; Fusion 

Tapes and procedure . Dichotic items were constructed by pairing the 
stop stimuli with all items in the llquld'^to-vowel arrays. The tape consisted 
of 96 pairs: (2 sets of stimuli) x (4 stimuli per array) x (3 le^ ^ times) x (2 
channel arrangements) x (2 observations per pair). Subjects llr ed to two 
passes through the tape, reversln;, headphones after the first pass. As usual 
they wrote their responses, indicating what they heard. 

Results . Fusion occurred at a rate of 52 percent for pairs containing 
Stimulus 1 or Stimulus 2, the stimuli which had been identified as beginning 
with a liquid. Other pairs yielded only 6 percent fusions, as shown in Figure 
17. No subject deviated markedly from the group data. 
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Figure 17: Results of Identification and fusion tasks involving 
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Overview: Llquld-llke transitions vere necessary for the phonological 
fusion of dlchotlc stop/llquld stimuli. Fusion occurred In direct proportion 
to the extent that "llquld-llke" Items were Indeed perceived as liquids In Iso- 
lation. 

Experiment VIII: Degraded Liquids 

Experiment VII showed that transitions in the liquid stimulus were neces- 
sary for fusion to occur. The present experiment was designed to determine which 
formant transition (or combination of transitions) Is necessary for fusion. 

Method . The PAY set and the RICK set were again used. Liquid stimuli 
appeared in many forms. Some were degraded in that certain formants were omitted 
from their acoustic structure. Figure 18 shows the component parts of the liquid 
stimuli. There were two possible third formants, that for hi and that for /!/. 
F3/r/ represents the third formant of the /r/ stimuli, while F3/1/ represents the 
third formant of the /I/ stimuli. All possible combinations of Fl, F2, F3/r/, and 
F3/1/ were used. Eleven liquid-like stimuli resulted in each set: 2 three- 
formant stimuli identical to those used in previous studies, 5 two-formant stimu- 
li, and 4 one- formant stimuli. Two-formant stimuli were F1+F2, Fl+F3/r/, 
F1+F3/1/, F2+F3/r/, and F2+F3/1/. One-formant stimuli were Fl, F2, F3/r/, and 
F3/1/. 

Each of the 11 liquid-like stimuli was paired with its appropriate stop 
stimulus. In addition two control pairs were constructed per set: one pair was 
a stop/stop pair (PAY/PAY and KICK/KICK), and the other was a stop presented to 

one ear and nothing to the other (PAY/ and KICK/ ) . No fusion responses 

should occur for control pairs if fusion occurs only for pairs containing liquid- 
like stimuli. A dlchotlc tape of 156 items was constructed: (2 sets of stimuli) 
X (13 pairs per set) x (3 lead times) x (2 channel arrangements per pair). 
Twelve subjects listened to one pass through the tape. 

Major results . There were two general fusion rates: most experimental 
pairs fused at a rate of about 55 percent, while a few pairs fused st a consid- 
erably lower rate, as shown in Figure 19. Pairs which rarely fused contained 
liquid-like stimuli with only Fl or F3/1/ and no other formant. Figure 18 shows 
that these stimuli lacked formant transitions in the mid-frequency range (1000- 
2000 Hz), while all other liquid-like stimuli had transitions in this region 
(either F2 or F3/r/). 

Stop + three-formant liquid stimuli fused at rates comparable to previous 
studies — 54 percent, with no significant difference between stop + /r/ and stop 
^ 111 • Stop + two-formant liquid pairs fused at a rate of 52 percent, with the 
exception of the stop + Fl,3/1/ case, where the fusion level was only 23 per- 
cent. All subjects showed this drop in fusion rate (£ > 3.18, £<.001). Stop + 
one-formant liquid pairs also showed high and low fusion rates. Stop + F2 and 
stop -f F3/r/ liquid pairs fused at a rate of 62 percent while stop + Fl and stop 
+ F3/1/ liquid pairs fused at a rate of only 18 percent. Again, all subjects 
showed these quantal differences in fusion rate (j^ " 3.18, £<.001). 

Other results , a) As iw previous studies, stop -I- /!/ responses occurred on 
more fused trials than stop ^ /r/. b) Stop/stop pairs yielded few stop + /I/ 
responses. Such responses would be ^false fusions" since the subject would be 
reporting a liquid which has not, in fact, been presented (Day, 1968). 



LIQUID STIMULI 




325 m%mt 



Figure 18: Schematic spectrograms of liquid- Initial stimuli 
and their component parts. 



37 




nniAiiis 



Figure 19 



Overview * A specific acoustic cue for phonological fusion of stop-llquld 
stimuli appears to be the liquid formant transition F2 or F3/r/. A single clslng 
formant transition In the range 1000 to 2000 Hz was necessary for fusion to 
occur. 

Experiment IX; Liquid "Chlrpn ' 

The previous study found that a mld**range formant transition was necessary 
for fusion to occur. The pr ^sent study was designed to determine whether that 
transition per se Is sufficient for fusion when paired with a stop stimulus. 

When the F2 transition Is removed from a liquid stimulus and synthesized by 
Itself, it sounds similar tu a bird's twitter, hence the name "chirp." 
Mattingly, Liberman, Syrdal, and Halwes (1971) and Wood (1973) have found that 
these "chirps" are not procesjed as speech. Chirp stimuli have two general fea- 
tures, relative frequency ran^e and direction (rising vs. falling). 

Method . Two stop stimuli were synthesized, PAY and KICK. Liquid stimuli 
were degraded so that only the F2 transition remained, a 100 msec chirp rising 
rapidly from a value of 1200 Hz to 1800 Hz. This and other chirps were synthe- 
sized at the same amplitude as the F2 transition in the full liquid stimulus. 
Twelve chirp stimuli were used; there were four frequency values for rising, 
falling, and steady-state chirps, as shown in Figure 20. Specific stimuli are 
numbered from low to high representing ordinal position on the frequency scale. 
Rising chirps were designated with a superscript "r," falling chirps with "f," 
and steady-state chirps with "s." Endpolnts for rising and falling chirps were 
600, 1200, IBOO, 2400, and 3000 Hz, while steady-state chirps had frequencies of 
900, 1500, 2100, and 2700 Hz. The original F2 transition from the liquid stim- 
uli was the chirp 2^. 

In addition to the stop/chirp pairs, there were some control pairs. Ordi- 
nary pairs such as PAY/KAY and PAY/LAY were used tc obtain baseline fusion rates. 
Stop/stop pairs were also Included to set a lower boundary on fusion rate since 
Experiment VIII found few stop + liquid responses for such trials. Hence the 
control pairs provided boundary conditions within which to compare the fusion 
rates for stop/chirp items. Three lead times were used such that stop/chirp 
stimulus pairs had the same temporal relationships as the stop and the F2 tran- 
sition of the full liquid in previous experiments. A dlchotlc tape of 180 items 
was prepared: (2 sets of stimuli) x (15 pairs per set)^ x (3 lead times) x (2 
channel arrangements per pair). Twelve subjects listened to one pass through the 
tape . 

Major results . Fusions occurred at a substantially reduced rate for all 
stop/chirp pairs, as shown in Figure 21. While the fusion rate for stop/liquid 
control pairs was 47 percent, a rate comparable to previous studies, fusion 
rates for stop/chirp pairs averaged only 8 percent. The difference was highly 
significant ( z^ - 3.18, £<.001). 



Twelve stop/chirp pair? plus control pairs such as PAY/LAY, PAY/RAY, and PAY/ 
PAY. 
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Fusions did occur, however, for selected stop/chirp pairs. Pairs with 
rising chirps (1^, 2^, 3^, and 4^) yielded an average fusion rate of 14 percent, 
higher than all other stop/chirp pairs combined (z " 2.6, £<.005). Within the 
category of rising chirps fusion occurred most readily for stop/2^ pairs where 
the fusion rate reached 24 percent. Eight of 12 subjects fused at a higher rate 
for these pairs than for any other stop/chirp pair (£ ■ 4.0, £<.0001), but even 
here fusions occurred significantly less often than for the stop/liquid contirol. 
pairs (£ « 3.18, £<.001). Thus, even the chirp stimulus mbst apprppriatejLo ..the 
full liquid is not entirely sufficient for fusion to dccur at an unreduced rate. 

Other results , a) The /I/ substitutions occurred for stop/liquid control 
pairs at rates comparable to previous studies, h) Fusion of stop/chirp pairs, 
however, were not dominated by stop + /I/ responses. In fact, only stop/2^ 
pairs yielded more stop + /I/ fusions than stop + /r/, /w/, or /y/ fusions. 
Fusions for lower frequency chirps (1^, 1^ , and 1®) were dominated by stop + /w/ 
responses, while those for higher frequency chirps (4^, 4^ , and 4^) were domi- 
nated by stop + /y/ and stop + /I/ responses. £> "False" fusion responses for 
stop/stop control pairs occurred only 2 percent of the time. 

Overview ^ The fusion rate for stop/ chirp pairs was low. Hence the F2 
transition In the liquid stimulus was necessary but not entirely sufficient for 
fusion to occur. Fusions did occur for pairs consisting of only the F2 transi- 
tion and the stop stimulus, but they occurred much less frequently than for the 
ordinary stop/ liquid pairs. Highest fusion rates among stop/ chirp pairs occurred 
for pairs containing the chirp stimulus whose frequency and direction most nearly 
matched that of the normal liquid stimulus. 

Summary of Linguistic Experiments (IV-IX) 

Three levels of linguistic cues were explored (the semantic ^level, phoneme 
level, and acoustic level), and the effect of cues at each level on fusion rate 
was observed. The results suggest that the cognitive processes involved in 
phonological fusion are influenced by cues at all three linguistic levels. 
Fusion rate was enhanced when fusible pairs were imbedded In sentence contexts, 
fusion occurred best for certain classes of phonemes, and specific acoustic 
cues were found which are Important for fusion. Linguistic cues within and out- 
side of the consonant/liquid pairs had a distinct effect on fusion rate. 

ADDITIONAL FINDINGS 

There are several aspects of the present data which are not primary to the 
major focus of the paper but which provide additional information about the 
phenomenon of phonological fusion: a) Individual differences In fusion rate, 
b) changes in fusion responses over time, c) ear effects, and d) /I/ substitu- 
HETons. 

Individual Dixferences 

In order to look at individual differences in fusion rates In the present 
experiments, subjects were >selected from those studies in which the specific 
experimental variables had little effect on fusion rate. The studies considered 



Er|C42 



were Experiments II, III, IV, and VI. In addition the results of Cutting 
(1973a) were also included since the stimuli used la that study were the same as 
in the present series. Thus, there were 64 subjects in all. 

The distribution of fusion rates for these subjects is shown in Figure 22. 
Note that the distribution is bimodal, with clusters of subjects at the high and 
low ends. The general shape of this distribution was representative of all 
experiments in the present series and is similar to that shown by Day (1970a) 
for subjects who listened to fusible natural speech pairs* 

Bimodal individual difference functions in the fusion task appear to reflect 
different types of processing, and have been found to correlate with performance 
on other tasks involving the perception of fusible stimuli (Day, 1970a). One 
such task involves the temporal-order judgment (TOJ) of the initial phonemes in 
a stop + liquid pair when the stimuli have different relative onset times. 
Those subjects who fused at a low rate were accurate in judging temporal order. 
However, those subjects who fused at a relatively high rate were poor judges of 
temporal order: regardless of whether the stop or liquid stimuli began first in 
time, they reported that the stop phoneme began first on most trials. Thus, the 
high fusers were constrained by the phonological rules of English in the TOJ task, 
and have been called "language-bound" (Day, 1970a). The other group of subjects 
has been called "stimulus-bound,** because they are able to determine accurately 
the stimulus events. Recent findings suggest that these two groups of subjects 
retain their group identity for other cognitive abilities such as digit-span 
memory tasks (Day, 1973), pattern recognition tasks, and secret language tasks 
(Day, in preparation-c) . 

Large individual differences appear to occur primarily in higher-leve! cog- 
nitive tasks. Turvey (1973), for example, found that individual difference ^ 
were greater for higher-level, more central visual processes than for lowf r~ 
level, more peripheral visual processes. Individual differences have beet, 
reported in at least two other higher-order visual tasks. Rommetveit and his 
co-workers (Ronmetveit , Berkeley, and Br^gger, 1968; Roometveit and Kleven, 
1968; Rommetveit, Toch, and Svendsen, 1968a, 1968b) have found marked individual 
differences in a visual analog to the phonological fusion task. When the letters 
SHAR are presented to one eye and SHAP to the other, many subjects reported see- 
ing the word SHARP* Other subjects did not fuse the stimuli. Romemtveit called 
these subjects "nonveridical" and "veridical" perceivers, respectively, and they 
may be analogous to the language-bound and stimulus-bound subjects in phonolog- 
ical fusion studies. Messmer (cited by Huey, 1908;91ff) also found large indi- 
vidual differences for a dloptic visual task. "Subjective** perceivers were 
those who, when presented with a stimulus such as INSPECTIXN, never perceived 
that there was anything amiss. **Objective** perceivers, on the other hand, were 
quick to report errors in orthography* 

Changes in Fusion Responses Over Time 

Fusion rates over time were examined for the same 64 subjects discussed in 
the individual differences section. The top half of Figure 23 shows fusion rate 
divided in quart lies of th€f' test. Fusion rate was stable throughout the tests, 
averaging about 60 percent for each quartile across the various experiments. 



For Experiment IV only the no-sentence context condition was considered. 
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Figure 23: Percent fusion and average number of responses f 
each quarter of t>' fusion tasks. 



Thus» the subjects appeared to be no more "Informed" about the nature of the 
stimuli on the last trial than they were on the first* This conclusion, however, 
must be qualified somewhat since fusion rate by Itself does not reflect the 
entire nature of the subjects' responses. 

During the course of the task, subjects began to report hearing more than 
one item per trial. Given the stimuli PAY/LAY, most responses were single 
Items, for example, PLAY or PAY. However, double responses occurred with In- 
creasing frequency over the course of the task. For example, subjects reported 
PAY and LAY, PLAY and PAY, PLAY and LAY, or even PLAY and PLAY. As shown at the 
bottom of Figure 23 double responses occurred on only 1 in 25 trials In the 
first quartlle, and Increased linearly to 1 In 5 trials by the last quartlle. 
This increase Indicates that subjects became more aware that two different stim- 
uli were presented on each trial as the task progressed. However, since fusion 
rates remained constant throughout the task, it is clear that the subjects were 
still unaware of the specific nature of the separate stimuli. 

Ear Effects 

Fusion occurred equally often for cases where the stop- initial stimulus was 
presented to the right ear and the liquid- initial stimulus to the left ear, and 
for the reverse configuration. Hence there were no car effects for fused trials. 
Many experimenters have found a right-ear advantage for dlchotlc speech stimuli 
(for a review, see Studdert-Kenned^ and Shankweller, 1970). Previous phonolog- 
ical fusion studies, however, did not yield ear differences, nor were any ob- 
tained in the present experiments. Ear differences for speech stimuli occur 
only when the itens cannot be combined into a single percept. In such cases an 
input competes with its opposite-ear rival for n processor which typically can- 
not handle both of them at the same time. Information loss results from this 
competition and ear effects reflect the general loss of information from each 
ear* In phonological fusion there is no information loss: if the stimuli are 
PAY/LAY and the subject reports hearing PLAY, he has reported all of the lin- 
guistic units presented to him, reorganized into a perceptual whole. Without 
the loss of information there can be no decrement in overall performance, and 
hence no ear effect can result. 

Ear advantages also failed to occur for nonfused trials. For single re- 
sponses the right-ear stimulus was reported 30 percent of the time and the left- 
ear stimulus 50 percent of the time. Typically in double responses all of the 
Information in both stimuli was reported, and therefore no ear effect could re- 
sult. Ear scares can also be measured In terms of which Item was reported 
first, the right-ear stimulus or the left-ear stimulus, but in the present 
scries of studies this analysis again yielded no ear effects. 

The /I/ Substitutions 

Given the stimuli PAY/RAY subjects often reported hearing PLAY. Difficul- 
ties with /r/ vs. /I/ occur in a wide variety of other situations besides ph<fno- 
logical fusion. Children, for example, have more difficulty in pronouncing /r/ 
than /I/ and sometimes pronounce both phonemes as /I/ (Morley, 1957; Powers, 
1957; Hurray, 1962); the deaf have more trouble processing /r/ than /I/, and 
often hear /I/ in both cases (Kosen, 1962); /r/ has a less stable articulation 
pattern than /I/ (Bronsteln, I960; Delattre, 1967); /r/ may yield more metathe- 
sis [spoonerism] errors than /I/ (see Cutting and Day, 1972); /r/ is more 



difficult to pronounce correctly under conditions of delayed auditory feedback 
than HI (Applegate, 1968); and /r/ yields a more varied pattern of car advan- 
tages than /!/ In certain dlchotic listening tasks not Involving fusion 
(Cutting, 1973d). Stress on perception and production systems, then, is more 
harmful to /r/ than /I/. The dlchotic fusion results are complementary to these 
studies. 

In phonological fusion HI substitutions cannot be accounted for by stlmu* 
lus identif lability. In fact, /r/ stimuli were slightly better identified than 
HI stimuli (see Appendix B). Word frequency and cluster frequency cannot 
account for the substitutions either. Furthermore, the /l/-substitution effect 
appears to override all linguistic cues except the semantic cues at the word 
level. 

w 

Cutting (1973b) found that subjects not only reported hearing PLAY when 
PAY/LAY and PAY/RAY were presented, for example, but that they also could not 
discriminate between the fusible pairs. In other words, PAY/LAY and PAY/RAY not 
only tended to yield the same fusion response, but they also were virtually in- 
distinguishable. These results suggest that there may be specific acoustic cues 
in the dlchotic listening situation which might be pertinent for the perception 
of stop + HI clusters and Improper for the perception of stop + /r/. 

GENERAL DISCUSSION 

The studies in the present paper showed that phonological fusion was vigor- 
ously Independent of nonllngulstlc stimulus variation, but sensitive to linguis- 
tic variation at all the levels that were studied. This overview suggests that 
there is a marked difference in the way linguistic and nonllngulstlc stimulus 
dimensions may be processed. 

Higher- and lower-level dimensions . Linguistic and noi linguistic dimensions 
of auditory stimuli are hierarchically related: linguistic dimensions imply the 
existence of nonllngulstlc dimensions, whereas nonllngulstlc dimensions may 
occur without any linguistic dimension present. For example, it is impossible 
to have a stimulus (such as the word BEE) which has linguistic attributes but no 
nonllngulstlc attributes, such as time constraints, pitch, and intensity. On 
the other hand, it is commonplace to have a stimulus (such as a tone) which has 
nonllngulstlc attributes but has no linguistic properties. These attributes 
have been called higher-level and lower-level stimulus dimensions. 

Higher- and lower- level processing . Given their hierarchical relationship 
one might assume that the processing of nonllngulstlc dimensions necessarily 
occurs before linguistic processing, and that linguistic* analyses might be con- 
tingent on the outcome of previous nonllngulstlc analyses. Indeed, for some 
tasks this appears to be the case. Day and Wood (1972) and Wood (1973), for 
example, found results supporting this notion in diotic f orced"-choice reaction 
time tasks: Irrelevant variation along a nonllngulstlc dimension impeded per- 
formance on a task involving linguistic decisions, whereas irrelevant linguistic 
variation had little effect on nonllngulstlc task performance. 

Dlchotic phonological fusion, however, demonstrates that linguistic analysis 
can occur independent of nonllngulstlc constraints on the to-be-fused stimuli. 
Fusible stimuli can vary widely in relative onset, pitch, and intensity with 
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little If any effect on perception. When different stimuli are presented to 
opposite ears, the Interaction that occurs between them Is qualitatively differ- 
ent from that which occurs for stimuli presented dlotlcally or monaurally 
(Studdert-Kennedy» Shankweller, and Schulman, 1970; Brady-Wood and Shankweller, 
1973; Cutting, 1973e). In the dlotlc and monaural cases, the Interaction occurs 
at a lower level, where timing, pitch, and Intensity are Important. In the 
dlchotlc case the constraints of these dimensions may be by-passed so that stim- 
uli may Interact at a higher level. 

However, not all dlchotlc tasks are free from nonllngulstlc bonds. For 
example, three of the six auditory fusions mentioned at the beginning of this 
paper (and discussed In more detail In Cutting, 1972) are sensitive to timing, 
pitch, and intensity differences between the stimuli. These are lower-level 
fusions which occur for both speech l^nd nonspeech stimuli. The higher-level 
fusions, which are not dependent on nonllngulstlc cues in the stimuli, occur 
only for speech sounds. In such cases, it is not raw stimuli that Interact in 
the higher- level processor, but linguistically coded information in the form of 
phonemes or phonetic features. 

The process of phonological fusion . Higher-level fusions appear to occur 
solely in the language processor who.re fusible linguistic units may be perceptu- 
ally combined. The nonllngulstlc dimensions of the stimuli have either been 
discarded at this point or sent to a different processor. Yet when the fusible 
stlTTuli entered the system they carried both linguistic and nonllngulstlc attrib- 
utes on the same waveforms. The process model considered previously (Figure 7) 
is helpful in explaining how linguistic and nonllngulstlc Information might be 
separated, and how phonological fusion occurs. 

The Higher-level processor is an information coder which codes incoming 
speech slgmls at enormous savings in terms of the amount of Information needed 
to be stored. Llberman, Mattlngly, and Turvey (197^) have estimated that this 
coder typically reduces a 40,000 blt-per-second acoustic signal into a 40 blt- 
per-second phonetic code suitable for further linguistic analysis. The cost of 
this coding process, however, is the quick loss in availability of nonllngulstlc 
information. The linguistic attributes of the signal are digitized (coded), 
%ihlle the nonllngulstlc dimensions appear to remain in nondlgltal, raw form. 

Dlchotlc stimuli appear to maintain their separate integrities while being 
transmitted primarily to the hemisphere opposite from the ear of arrival (Milnt:r, 
Taylor, and Sperry, 1968; Sparks and Geschwind, 1968). While language processing 
occurs primarily in the left hemisphere for most people, a certain amount of 
linguistic coding may occur in the right hemisphere (see Gazzaniga, 1967). Thus 
two speech signals, one from the right ear and one from the left, may he analyzed 
(coded) Independently in separate hemispheres. Phonological fusion, then, 
appears to be the integration of two coded representations of the fusible stimu- 
li. 

Semantic, phonemic, and acoustic dimensions of the fusible stimuli often 
have an effect on fusion rate. It appears that once the fusible stimuli have 
been linguistically coded, this coded information Interacts with other linguis- 
tic information at different levels of language. Given the appropriate experi- 
mental situation, cues from all three levels appear to work in concert. Consider 
one of the sentence pairs from Experiment IV: THE TREES ARE GOING AGAIN 



48 



prcsanted to one ear and THE TREES ARE ROWING AGAIN presented to the other ear. 
Semantic cues at the sentence level Increased the fusion rate for GO/ROW pairs 
beyond the rate they yielded when presented In Isolation. Cues at the phoneme 
level were also influential in maintaining a high fusion rate. For example, 
Experiment V found that GO/ROW pairs fused at a higher rate than FOE/ROW pairs, 
indicating that certain phoneme classes fuse more readily than others. Experi- 
ment VIII found that the second formant transition in the liquid was a specific 
cue for fusion, and when it was not present the fusion rate was considerably 
reduced. Furthermore, this cue appears to be pertinent to the /l/-substitution 
effect: fusion rates and 111 substitutions were approximately equal for stop/F2 
and stop/liquid stimuli. Thus the high rate of THE TREES ARE GLOWING AGAIN 
responses appears to be the result of the synergy of cues from three very dif- 
ferent linguistic levels. 

APPENDIX A - ACOUSTIC STRUCTURE OF STIMULI 

Stimuli within a particular set were identical in all respects except for 
the acoustic structure of the first 150 msec. Stop stimuli began with appropri- 
ate 50 msec transitions ending at the steady-state frequencies of the following 
vowel. Liquid stimuli followed a pattern suggested by 0*Connor et al. (1957): 
each liquid began with a 50 msec steady-state onglide, followed by 100 msec 
transition in F2 and F3 and a 20 msec transition in Fl, followed by a vowel. 
Within a particular set, liquid stimuli differed only in the steady-state onglide 
of F3, and in the F3 transition, the cue most important for the separation of /r/ 
from III (0*Connor et al., 1957). 

Since previous studies had found many 111 substitutions, an effort was made 
to make /r/ stimuli as highly identifiable as possible. Thus, when there was 
any doubt as to what values were to be selected for /r/ and 111 stimuli, deci- 
sions were always made to favor hi rather than /I/. The choice in duration of 
the steady-state onglides, the duration of the F2 and F3 transitions, and the 
frequency value of the Fl steady-state onglide (311 Hz) were three such deci- 
sions. Identification results proved that /r/ stimuli were somewhat more identi- 
^fiable than 111 stimuli (see Appendix B) . Thus, 111 substitutions cannot be 
accounted for by the identif lability of the /r/ stimuli. 

APPENDIX B - IDENTIFICATION TASK RESULTS 

Subjects participated in identification tasks after fusion tasks so that 
specific information about the individual stimuli gained in the identification 
task could not influence their fusion results. Ten tokens of each stimulus used 
in a particular experiment were presented singly in a random order with three 
seconds between items. Subjects were instructed to write down what they heard 
after each presentation. In some experiments they were free to* write whatever 
they heard, while In others they chose among a limited repertoire of initial 
phonemes. The results were the same regardless of the instructions: averaging 
over all experiments, stop stimuli were correctly identified on 95 percent of 
all trials, fricative stimuli 96 percent, /r/ stimuli 94 percent. 111 stimuli 
90 percent, and semivowel stimuli (/w/ and lyl) 83 percent. When errors occurred 
they were primarily within-phoneme-class errors. Thus, stops were identified as 
stops, fricatives as fricatives, and liquids and semivowels as liquids and semi- 
vowels. Identification results were similar for all subjects in all experiments. 
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APPENDIX C - SEMANTIC APPROPRIATENESS OF SENTENCES 



Twenty subjects who did not participate in any of the fusion experiments 
were asked to rate which of two alternative forms of a sentence was the most 
"meaningful": for example, THE MINISTER PRAYS FOR US vs. THE MINISTER PLAYS FOR 
US. All sentences were possible fusion responses in Experiment IV. Results 
shown in Table C-1 indicated that most subjects agreed as to which sentence of 
each pair was most "meaningful" and these ratings were taken as a measure of the 
semantic appropriateness of the sentences. 



TABLE C-1: Forced-choice scores for the possible fused sentences. 







Sentence Pair. 


No. Subjects 










a. 
b. 


THE 
THE 


MINISTER PRAYS FOR US* 
MINISTER PLAYS FOR US 


18 

2 




3 


.8, 


£<.0001 


a. 
b. 


THE 
THE 


TRUMPETER PRAYS FOR US 
TRUMPETER PLAYS FOR US* 


1 
19 




A 


.0. 


£<.0001 


a. 
b. 


THE 
THE 


TREES ARE GROWING AGAIN* 
TREES ARE GLOWING AGAIN 


18 

2 




3 


.8, 


£<.0001 


a. 
b. 


THE 
THE 


COALS ARE GROWING AGAIN 
COALS ARE GLOWING AGAIN* 


1 
19 


1. " 


4 


.0. 


£<.0001 



*Semantically appropriate sentences. 
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Phonological Fusion of Synthetic Stimuli In Dlchotlc and Binaural Presentation 
Modes 

James E. Cutting^ 

Hasklns Laboratories, New Haven, Conn. 



Phonological fusion occurs when the phonemes of two different speech stimuli 
are combined Into a single percept which contains all the linguistic information 
from the two Inputs. Thus, for exwple, PAHDUCT/RAHDUCT — ►PRODUCT (Day, 1968) 
and BANKET/LANKET — ►BLANKET (Day, 1970). ^ The present study was designed to 
observe fusion rates in both dlchotlc and binaural presentation modes, shown In 
Figure 1. Previous studies (for example. Day, 1968, 1970; Cutting, 1973a) have 
presented fusible stimuli dlchotlcally, one stimulus to the right ear and the 
other to the left* Such presentation appears to allow for the Independent pro- 
cessing of the two Items before combining them Into a perceptual whole (Cutting, 
1973a). In the binaural mode, on the other hand, the independent extraction of 
linguistic features from the two stimuli Is not possible since the two stimuli 
are electrically mixed and both are presented to each ear as part of the same 
waveform. 2 

Method . Four stimulus sets of the general pattern were synthesized on 

the Hasklns Laboratories* parallel resonance S3mtheslzer: the PAY set (PAY, RAY, 
LAY), the BED set (BED, RED, LED), the CAM set (CAM, RAM, LAMB), and the GO set 
(GO, ROW, LOW). All sets had been used previously by Cutting and Day (1972) and 
Cutting (1973b, 1973c). Stimuli within a given set were Identical In duration, 
pitch, and Intensity, and differed only In the formant transition requisites of 
the first 150 msec. Each stimulus was highly Identifiable when presented In Iso- 
lation (Cutting, 1973a, 1973b). Fusible pairs were constructed using stimuli 
within the same set; for example, PAY/RAY and PAY/LAY. All stimuli and possible 
fusions were high frequency English words (Carroll, Davles, and Rlchman, 1971). 
Stimuli were digitized and stored oa disc file for the preparation of dlchotlc 
and binaural fusible pairs. Pairs ^f stimuli began simultaneously, or one 



Also Yale University, New Haven, Conn* 

^The arrow should be read as **ylelds." 
2 

Llckllder (1951:1027) noted chat binaural presentation Is a general term de- 
noting the stimulation of both ears, while dlotlc presentation Is a specific 
term for the presentation of a single stimulus to both ears at the same time. 
Since, In the present study, two stimuli are both presented to both ears, the 
more general term binaural Is used* 

[RASKINS LABORATORIES: Status Report on Speech Research SR-34 (1973)] 
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stimulus preceded the other by 30 msec. LAY* for example, began before PAY, and 
PAY before LAY on an equal number of trials. Channel assignments were counter- 
balanced for each dichotic pair. 

Twenty-four Yale University undergraduates listened to both dichotic and 
binaural fusible items. Eight subjects listened first to a sequence of dichotic 
trials, then to a sequence of binaural trials (Group 1), while eight o:hers 
listened in the reverse order (Group 2). The remaining subjects listened to a 
tape consisting of dichotic and binaural trials randomly intermixed (Orcap 3) . 

Two tapes were prepared. The first tape consisted of 96 dichotically 
recorded items: (4 sets of stimuli) x (2 stop/liquid pairs per set) x (3 lead 
tiroes) X (2 channel arrangements) x (2 observations per pair). The tape was 
played on an Ampex AG-500 dual-track tape recorder, sent through a roixing box 
and a listening station to Grason Stadler earphones (Model TDH39-300Z) . At the 
roixing box signals either remained separate (dichotic) or were mixed onto both 
channels (binaural) according to the experimental condition. The second tape 
was exactly twice as long (192 items) and consisted of a random sequence of all 
possible dichotic and binaural pairs. Binaural items on this tape were con- 
structed as the tape was recorded. Subjects wrote down what they heard on each 
trial. 

Results . Fusion rates were much higher for dichotic pairs than for binaural 
pairs — 45 percent and 15 percent, respectively. This 3:1 ratio was highly sig- 
nificant iz » 4.8, £<.0001): all 24 subjects yielded fusion results in this 
direction, and as shown in Figure 2, each group of subjects yielded fusion rates 
indicative of this difference. 

Fusion r.^tes were highest for dichotic pairs in Group 1, when they were 
presented in £ blocked manner before the binaural trials, and lowest for Group 2, 
when they were presented after the binaural block of trials. There was, however, 
no significant difference among the three groups of subjects. For binaural 
pairs, fusion rates were highest for Group 3 in the mixed presentation, and 
lowest for Group 2 in the blocked presentation when they preceded the dichotic 
trials. The difference between these two groups was significant [U ^8,8) « 4, 
£<.001], indicating that fusion rates for binaural pairs was increased when they 
were presented in the same random sequence with dichotic fusible pairs, while 
fusion rate for dichotic pairs remained relatively stable. 

Stop + /r/ stimuli (for example, PAY/RAY) and stop + /I/ stimuli (PAY/LAY) 
fused equally well. The fusion responses, however, showed that most fusions 
were stop + /I/. In the fusion response, /I/ was often substituted for /r/ (PAY/ 
RAY— ^PLAY) , whereas the reverse substitution rarely occurred. Day (1968), 
Cutting and Day (1972), and Cutting (1973a, 1973b, 1973c) have observed the /l/- 
substitution effect. It cannot be accounted for by word frequency of the fusion 
responses or cluster frequency of stop consonants and liquids, but it may be 
linked to the specific cues in the dichotic situation (Cutting, 1973c). The /I/ 
substitutions occurred t ^ a lesser degree in the binaural situation. 

The effect of lead time was similar to that found in previous studies using 
these stimuli (Cutting, 1973b, 1973c): fusions occurred more readily when the 
stop stimulus began before the liquid than in either of the other two lead-time 
conditions. This effect occurred for both dichotic and binaural items. For 
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further discussion of the effects of lead time see Cutting (1973a) and Day (In 
preparation) • 

In both conditions, when fusion responses did not occur, typically only 
the stop stimulus was reported — for example, PAY/LAY — ►PAY. 

Conclusion . The difference between dlchotlc and binaural fusion rates may 
be Interpreted as an Indication that phonological fusion Is a higher-level pro- 
cess. Higher-level fusions, for example, appear to occur after the fusible 
stimuli have been Independently coded by more central processors (see Cutting, 
1972, 1973a), suggesting that compatible linguistically coded Information, and 
not compatible raw Inputs, Interacts to form a higher-level fusion response. 
In the binaural condition the electrical mixing of the fusible stimuli Inhibits 
this kind of fusion for the synthetic Items used In the present study. Their 
linguistic aspects may have been masked at a lower, more peripheral level so 
that they reached the higher-level processors In a much degraded form unsulted 
for fusion. 
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Phonological Fusion of Stimuli Produced by Different Vocal Tracts 
James E. Cutting* 

Hasklns Laboratories, New Haven, Conn. 



Phonological fusion Is Independent of many nonllngulstlc constraints that 
govern other types of auditory fusion (Cutting, 1972). For example, a fusible 
pair consisting of PAY presented to one ear and LAY presented to the other ear 
often yields PLAY, whether the two stimuli begin at the same time, share the 
same fundamental frequency contour, or have the same peak Intensity (Cutting, 
1973). The present study explores another nonllngulstlc dimension — the apparent 
vocal tract size from which the stimuli were uttered. 

Stimuli . Two fusion sets were synthesized on the Hasklns Laboratories* par- 
allel resonance synthesizer: the PAY set (PAY, RAY, LAY) and the KICK set 
(KICK, RICK, LICK). All stlm* il were highly Identifiable, and both sets were 
used by Cutting (1973) In observing the effects of other nonllngulstlc dimen- 
sions on phonological fusion. In the present study stimuli within a set were 
Identic* 1 In pitch. Intensity, and duration, and differed only In the formant 
structure of the first 150 msec. The PAY and KICK sets were 350 and 325 msec In 
duration, respectively. Each Item was synthesized In two versions: one as If 
uttered by a normal adult male with a large vocal tract, and the other as If 
uttered by a male midget. The small vocal tract stimuli were Identical to the 
large vocal tract stimuli except that the formants were 20 percent higher In 
frequency, as si vn In F'^gure 1. This change In formant frequency created stim- 
uli that would be uttered by a vocal tract diminished In all dimensions by a 
factor of 1/6. 

Tapes . The stimuli were digitized and stored on disc file for the prepara- 
tion of dlchotlc tapes. Dlchotlc pairs consisted of members of the same stimu- 
lus set: for example, PAY/RAY and PAY/LAY. Members of a dlchotlc pair were 
uttered by the same vocal tract or by different vocal tracts. For example, 
"same" pairs were PAY-large/LAY-large (for "large" vocal tract) and PAY-small/ 
LAY-small. "Different" pairs were PAY-large/LAY-small and PAY-small/LAY-large. 
Two tapes with different random orders Included both types of pairs and con- 
sisted of 96 dlchotlc Items: (2 sets of stimuli) x (2 stop/llqulJ pairs per 
set) X (4 combinations of vocal tract selections) x (3 lead times) x (2 channel 
arrangements per pair) . The lead times selected were the simultaneous onset and 
50 msec stimulus onset asynchronles. Two possible 50 msec leads occurred In 
equal probability with the 0 onset case; for example, LAY began before PAY and 
PAY began before LAY. The leads were used to make the present study comparable 
to those In Cutting (1973). Channel arrangements for each pair were counterbal- 
anced. 



*Al80 Yale University, New Haven, Conn. 
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"Same" "Different" 



VOCAL TRACT SIZE 



Figure 2: Results of the fusion task vhen the apparent vocal tract size of 
the stimuli was varied. 
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Subjects > apparatus, and procedure > Twelve Yale University undergraduates 
participated as subjects In this phonological fusion task. Each subject was a 
right-handed native American English speaker with no history of hearing diffi- 
culty. They listened In groups of four to dlchotlc tapes played on an Ampex 
AG500 dual track tape recorder sent through an attenuator and listening station 
to Grason-Stadler earphones (Model TDH39-300Z). Subjects were Instructed to 
write down what they heard: one word or two words, real words or nonsense. 

Results ^ Fusion rate for "same" an^ "different" pairs was Identical — 59 
percent each, as shown In Figure 2. Furthermore, fusion rate was within a few 
percentage points for all four combinations of vocal tract selections. 

Other results show a pattern similar to that discussed In Cutting (1973). 

a) Fusion rates for stop + /r/ and stop + /!/ stimuli were nearly Identical. 

b) Many perceptual substitutions occurred among the liquid phonemes; for 
example, when PAY/RAY fused, PLAY responses were more frequent than PRAY re- 
sponses. On the other hand, when PAY/LAY fused» PLAY responses occurred nearly 
all the time, c) Fusions were more frequent when the stop stimulus (e.g., PAY) 
led the liquid stimulus (e.g., LAY) than In the other two lead-time configura- 
tions, d) Individual differences In fusion rate were consistent with previous 
findings (Day, 1970; Cutting, 1973)t some subjects fused at a high rate, others 
fused at a relatively low rate, and few subjects fused at Intermediate rates. 

Overview . For phonological fusion to occur, fusible stimuli need not be 
uttered by the same vocal tract. This result Is a further Indication that pho- 
nological fusion Is a higher-level process » not dependent jn nonllngulstlc com- 
patibility of the stimuli (see Cutting, 1973). Differences In vocal tract size 
shift the entire formant structure of the to-be-fused stimuli » yet this nonlln- 
gulstlc variation has little. If any, effect on fusion rate. Fusion appears to 
occur after the stimuli have been linguistically coded » and this coding process 
appears to separate linguistic and nonllngulstlc Information. 
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Phonetic Prerequisites for First -Language Acquisition* 

Ignatius G. Mattingly^ 

Haskins Laboratories, New Haven, Conn. 



In a well-known passage in his Aspects of the Theory of Syntax (1965:30), 
Chomsky lists a number of prerequisites for the infant speaker-hearer *s acquisi- 
tion of competence in his native language. For each of these prerequisites, 
Chomsky argues, there is an analogous requirement for linguistic investigation. 
The first prerequisite is '*a technique for representing input signals;" that is, 
the infant, if he is to master his native language, must have at his command an 
operational universal phonetics. Chomsky's other prerequisites have to do with 
the infant's capacity to form, test, and select hypotheses, about the grammar of 
his language. 

Psycholinguists interested in first-language acquisition have traditionally 
paid rather less attention to Chomsky's first prerequisite than to the others. 
The research that has been done on the phonetic capacity of infants and young 
children has dealt more with the production than with the perception of speech, 
and has not been very much concerned with what may be called the Representation 
Problem: how a child "represents inpuc signals." This bias is quite under- 
standable: experimental procedures can be much less sophisticated, for study of 
production than for study of perception. But more recently, a number of experi- 
mental studies of infant speech perception have been carried out, using changes 
in heart rate, sucking rate, or evoked potential to study the infant's ability 
to recognize and discriminate among speech sounds (see Eimas, in press, for a 
review) . 

The Representation Problem is actually only one of the problems that a 
satisfactory account of the infant's phonetic capacity must solve. It must also 
solve two ^c'lier, logically prior problems, which may be called the Speech Detec- 
tion Problem and the Vocal Tract Problem. » ^ 

The Speech Detection Problem may at first seem trivial. If the infant is 
to gather linguistic data from his environment, he must have a way of distin- 
guishing speech sounds from nonspeech sounds. If we choose, of course, we can 
regard this problem as simply a special case of the more general problem of 



*Based on talks given at the International Symposium on First-Language Acquisi- 
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pattern recognition by humans, but then, we can also dispose of the whole ques- 
tion of language acquisition in that way. Nor will it do to say that the infant 
defines speech as those reassuring sounds coming from his mother* s mouth (and if 
he did, we should still have to explain hov he arrives at this definition), for 
many nonspeech sounds come from his mother *s mouth, and there are in his environ- 
ment many other sources of speech, by no means limited to visible mouths. It 
seems necessary to suppose that the infant has available some procedure that can 
sort out speech from nonspeech, and we do not know what this procedure is. (For 
what it is worth, the parallel engineering problem is likewise unsolved: there 
exists no reliable device or algorithm for automatically sorting speech signals 
from all nonspeech signals.) In fact, a similar procedure must»be attribrced to 
the listener who is already linguistically competent, but the problem appears to 
be less serious because the competent listener can conceivably apply linguistic 
and semantic criteria to decide whether he is listening to speech, while the in- 
competent infant cannot. But we' do not know that tbe mature listener does actu- 
ally rely primarily on linguistic and semantic criteria, and the fact that he 
can readily detect speech even when it is unintelligible, or in an unknown lan- 
guage, suggests that he must have phonetic criteria. Almost nothing is known 
about the Speech Detection Problem, though we shall mention later an observation 
that sheds a little light on speech detection by both adults and children. 

The second problem is the Vocal Tract Problem. How can the infant manage 
to extract useful linguistic data from the outputs of vocal tracts that, as 
Lieberman, Crelin, and Klatt (1972) have demonstrated, differ from one another 
and from his own immature vocal tract in shape, size, and acoustical properties? 
If we take the position that the linguistic information in speech is represented 
by acoustic correlates that are, at least at some level of abstraction, invari- 
ant, the problem is not so serious: all we have to do is to explain how the in- 
fant arrives at this level of acoustic invariance, and then, when he wishes to 
produce the same sounds, how he works out what the acoustical realization of 
this invariance would be in his own speech, even though he may never have heard 
another infant speak. But serious objections have been raised against a view of 
speech perception that depends on acoustic invariants. We need not go so far as 
to say that there are no acoustic invariants, but there are not enough of them, 
and even these available ''invariants** sometimes vary. The speech signal simply 
cannot be analyzed and segmented into units corresponding in any clearcut way 
with the discrete phonetic elements that the speaker believes he is producing 
and that the listener believes he is perceiving. Many of us have come to be- 
lieve that speaker and hearer deal not in acoustic invariants but in articula- 
tory events that are encoded in the acoustic signal by speech cues (Liberman, 
Cooper, Shankweiler, and Studdert-Kennedy , 1967). Thus the fact that /B, d, g/ 
are respectively labial, alveolar, and velar stop consonants is cued by acous- 
tically observable shifts in the resonances of the vocal tract. These shifts 
reflect both the stop closure and where in the vocal tract this closure is made. 
The listener, having tacit knowledge of the properties of the vocal tract— a 
knowledge of the speech code, in other words — is able to recover the succession 
of articulatory events. It follows from this view, often called "the motor 
theory of speech perception" (Liberman, Cooper, Harris, and MacNeilage, 1963), 
that in order for the infant to gather the data he needs, he, like other 
speaker-hearers, must know something about the acoustic coding of articulatory 
events, and he must be able to calibrate his perceptions for a particular vocal 
tract. 
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But how is he to learn about vocal tracts? At one time it seemed reason- 
able to suppose that he learned by studying his own vocal tract and applying 
some simple mathematical transformation (Fant, 1953). But there are difficul- 
ties for this view. Medical cases have been cited of people with congenital 
gross damage to the vocal tract who do not necessarily have problems in the per- 
ception of speech (Lenneberg, 1967). The vocal tracts differ along not just one 
but a number of partially independent dimensions, e.g., the distance from larynx 
to palate or the distance from pharyngeal wall to lips. No very simple trans- 
formation will suffice. Lieberman et al. (1972) have shown that the vocal tract 
of an Infant is a particularly poor guide to that of an adult: the larynx is 
too high, the pharynx is too small, and the Jaw is too big in proportion. It 
seems more likely that his own vocal tract is itself a problem for the child. 
Even if he already has a general knowledge of vocal tracts, he must determine 
the individual characteristics of his own if he is to produce speech, and per- 
haps one of the things he is doing during the babbling stage is mapping his 
vocal tract. Since he can apparently use data from adult vocal tracts to guide 
production from his own vocal tract, which is not only very different from an 
adult's but is also changing its configuration rapidly during the period of 
first-language acquisition, we are forced to say that he must understand not 
only the physiology of the vocal tract but also something about its ontogeny. 

As for our third problem, the Representation Problem, there is both direct 
and indirect evidence that the capacity to perceive phonetic categories is 
innate. Abramson and Lisker (1970) have made an extensive cross-language investi- 
gation of the acoustic cue of voice onset time (VOT). VOT is the difference in 
time, positive or negative, between the instant of release of oral closure of 
stop consonants such as /p/ or /b/ and the beginning of laryngeal voicing. This 
speech cue occurs in a large number of languages. In English the labial stop 
will be heard as an aspirated [p^] if the onset of voicing is delayed as much as 
AO msec, but [b] for lower valur^s of VOT. Thus -fAO approximates the phoneme 
boundary that separates /p/ and /b/. In other languages there is a second 
boundary at about -30 msec; still other languages have three stops at the same 
position of articulation, and ui^e both phoneme boundaries. Moreover, when sub- 
jects are asked to discriminate neighboring sounds along the VOT range, the 
general finding is that they discriminate extremely well with stimuli close to 
the phoneme boundaries of thair language, and rather poorly elsewhere. Their 
perception is categorical, as is the case with other speech cues (Liberman, 
Harris, Hoffman, and Griffith, 1957). The significant point, however, is that 
there seem to be only two possible VOT phoneme boundaries, regardless of lan- 
guage. This limitation would appear to be a linguistic universal, something 
that is part of any infant speaker-hearer's innate capacity to acquire language. 
What he has to learn is whether either or both boundaries are actually iised in 
his native language. 

Eimas and his colleagues (Eimas, Siqueland, Jusczyk, and Vigorito, 1971; 
Eimas, in press) have made a more direct investigation of this question. They 
tested the ability of four-week-old Infants to discriminate VOT differences. 
Exploiting the fact that their subjects sucked ntore frequently if presented with 
a perceptually novel stimulus, they found that the infants could discriminate 
more readily when the original and the novel stimuli were on opposite sides of 
the VOT boundary than when both stimuli, though differing by the same amount of 
VOT., were on the same side of the phoneme boundary. Thus it appears that in- 
fants can detect at least one acoustic cue soon after birth. Eimas (in press) 
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and his colleagues have also studied perception of place of articulation » and 
have obtained similar results. 

Finally, let us return to the Speech Detection Problem. Offhand, one might 
suppose that there existed somewhere In the auditory system a device for decid- 
ing whether what was being heard was nonspeech or speech. To be perceived as 
speech, the signal would have to satisfy certain criteria of naturalness. If 
the Input signal were judged to be nonspeech, the speech processor system would 
not be evoked and the Information would be sent elsewhere. But If the signal 
were judged to be speech, some. If not all, of the Information In the signal 
would be sent to the speech processor for further analysis. 

This would seem to be a reasonable arrangement, provided we could conceive 
of a speech detector that was significantly less complex than a speech processor. 
But there Is another possibility that occurred to us because of an unexpected 
outcome of a recent experiment (Mattlngly, Llberman, Syrdal, and Halwes, 1971; 
see also Elmas, Cooper, and Corblt, in press). Subjects were asked to discrim- 
inate a series of very simple synthetic stimuli which were supposed to be speech. 
An examination of the data, however, revealed fairly* clearly that the subjects 
were hearing the stimuli sometimes as speech and sometimes as nonspeech. Our 
conclusion was that we had oversimplified our stimuli: they did not contain 
enough speech cues to sound like speech consistently. This occurrence suggests 
a different way of looking at the speech detection problem: no speech detector 
as such ic required; rather, speech will be detected provided enough cues are 
present to arouse the speech processor, even though the stimuli are not very 
natural sounding. That the brain may work in this way is very fortunate for the 
experimenter. Like an ethologist, he can present very simplified and hence per- 
haps very unnatural stimuli to his subjects and yet obtain valid results 
(Mattlngly, 1972). It also suggests that the infant's "technique for represent- 
ing input signals" is a very robust process that is not easily upset by confus- 
ing, inconsistent, or fragmentary input. 
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A Note on the Relation between Action and Perception* 
M. T. Turvey^ 

Hasklns Laboratories, New Haven, Conn. 



I would like to explore the thesis that, In principle, the fundamental prob- 
lems to be solved and the concepts that will provide their solution are the same 
for both the Theory of Action and the Theory of Perception. Consider the fol- 
lowing transcription task. A person sees a written capital A and Is required to 
respond by writing the letter that she has seen. There can be, of course, dif- 
ferent tokens of the letter seen, and there can be a variety of manners In which 
our person Is requested to make her written response. We may partition a per^ 
ceptual-motor occurrence of this kind Into three phrases. The ilrst consists of 
a set of functions that map states of the optic space (o) Into states of the 
perceptual space (£); the second consists of a set of functions that map states 
of the perceptual space Into states of the act space (a); and the third Is a set 
of functions that map states of the act space Into states of the motor space (m) 
We can represent the three phases as: ■ {f |f:o-^}, F2 ■ {flf:£->a}, and F3 « 
{f I f :a-^}. 

The thesis to be explored suggests that we should look for similarities be- 
tween the functions on the perceptual end and those on the action end of our 
transcription task, i.e., between Fi and F3. In addition, it suggests that we 
should ask in what way(s) the perceptual and act spaces may be similar. The 
present paper is a response to these suggestions. 

The concept of action 

I am going to assume that our collective intuitions about the concept of 
perception will probably suffice for the present purposes. However, I will not 
make the same assumption for the concept of action, primarily because there has 
been far less hue and cry about this concept in theoretical psychology. Conse- 
quently, its character is less well articulated — witness the tendency to equate 
action with response in comparison to the tendency to equate perception with 
stimulus. We do the latter rarely, and the former frequently. Action, like per 
ception, is an abstract relation between the organism and the environment. Just 
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as we are unable to point to perception, we are unable to point to action, al- 
though, of course, we might be able to detail the parameters of stimulation, and 
to describe fully the muscles and Joints Involved In an exhibited movement. 

A number of philosophers (see Care and Landesman, 1968) have sought to lay 
bare the concept of action. Here I will report only briefly on their endeavors , 
merely to portray two rather Important characteristics of the concept* The 
first Is that nerve Impulses and muscle contractions — though necessary conditions 
for action — are more accurately considered accompaniments of action than char- 
acteristics of action. This point can be defended on at least two grounds: 
first on the notion of Intent lonallty, and second (to be explored at some length 
below) on the Issue of movement equivalence, or constancy. Clearly, It Is per- 
fectly reasonable to say that one Intends to kick a football, but It Is not 
reasonable to say that one Intends to contract and to relax one's biceps femorls 
and one's rectus femorls, respectively, to this or that degree. Generally one 
cannot choose or Intend to transmit a nerve Impulse or to contract a certain set 
of extrafusal and Intrafusal muscle fibers. Intentlonallty Is a defining char- 
acteristic of acts, but not of muscle contractions. 

Another defense of the notion that the concept of action cannot be reduced 
to bodily movement Is that any particular constellation of muscular contractions 
and Joint motions brought Into play when one performs an act (say, reaching for 
and lifting up a cup) cannot be said to be Identical to the act. A radically 
different configuration of muscles and Joints could Just as easily have been 
used to achieve the same result. An act Is Gestalten, that Is to say. In a vari- 
ant of the hackneyed phrase of Gestalt Psychology: an act Is more than the sum 
of Its constituent movements. 

The second Important aspect of the concept of action Is that consequ^^^^ces. 
I.e., changes wrought In the environment by a configuration of movements, are 
Integral to the concept of action In that no reliable distinction can be drawn 
between the concepts ef action and consequence. Consider the following: George 
kicks the football (of the round kind), and scores the goal that wins the champi- 
onship. Now we could say that George kicked the foot4>all and that a consequence 
of this action was that a goal was scored. Or, we could say, just as appropri- 
ately, that George scored a goal with champlonshlp-wlnnlng consequences. "Scores 
the goal," therefore, can be viewed either as consequence or as action. We might 
suppose that there are criteria available to determine what occurrences should 
receive an action label, an' what occurrences should receive a consequence label. 
Unfortunately, the criteria that have been advanced have not been greeted wl':h 
universal approval. 

The problem of constancy In percelvlnR and acting 

Let us now return to the phases In transcription referred to above. In par- 
ticular Fi and F3. Consider that a visually presented capital A can occur in 
various sizes and orientations ^nd in a staggering variety of individual scripts. 
Yet in the face of all this change, the identification of the letter remains 
constant; we see through the variations to the canonical form. 

This phenomenon of constancy is not limited to the domain of perception, but 
is equally characteristic of action. Thus, the letter A may be written without 
moving any muscles or Joints other than those having to do with the fingers. Or, 
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It may be written through large movements of the vhole arm with the muscles of 
the fingers serving only to grasp the writing Instrument. Or, more radically, 
one can write the character without Involving the muscles and Joints of either 
arms or fingers, by clenching the writing Instrument between one's teeth or toes. 
It Is evident that a required result can be attained by an Indefinitely large 
class of movement patterns. 

On examination of the phenomenon of constancy we might raise, the query: How 
can these Indefinitely large classes of possible shapes, and of possible movement 
patterns, be stored In memory? The answer Is that they are not. Clearly, I do 
not have on record In memory all possible visual versions of A, since I have 
never experienced most of them. And similarly, I do not have memorized all 
possible temporal sequences of all possible configurations of muscle motions that 
write A; Indeed, I have yet to perform them and by all accounts I never will. 
The essential question about our transcription task, therefore, can be stated 
more fundamentally: How can I recognize and produce the Indefinitely various 
Instantiations of A without previous experience of them? 

In response to this question let us turn our attention to linguistic theory. 
The point of departure for transformational grammar Is that our competency In 
language Is such that we can produce and understand a virtually Infinite number 
of sentences. As Weimar (1973) has pointed out, there are echoes of Plato's 
paradoxes In Chomsky's (1965) claim that our competence In language vastly out- 
strips our experience with It. Chomsky's claim Is motivated by the observation 
that experience with a limited sample of the set of linguistic utterances yields 
an understanding of any sentence that meets the grammatical form of the language. 
To explain this competency Is, for Chomsky (1966), a central problem In the 
Theory of Language. But given the points advanced above, the constancy function 
In action and perception Is likewise Indicative of a competency that exceeds 
prior learning. The child, we may note, learns to vrlte A under conditions which 
restrict her to a small subset of the very large set of A-*wrltlng movements. But 
she Is able subsequently to write A with practically any movement pattern she 
chooses. I.e., she can write A In novel ways. Similarly, limited visual experi- 
ence with some A's Is sufficient to allow the child to Identify virtually any A. 
Thus, acting and perceiving are cre£tlve In the sense that language Is creative, 
and I would submit, therefore, that the explanation of this creativity Is central 
to the theories of action and of perception, and at the very heart of our under- 
standing of perceptual-motor skill. 

The search for a workable account of the creativity manifest In language has 
led transformatli^nal grammarians to what has been aptly described as "the explan- 
atory primacy of abstract entitles" (Welmer, 1973). The Idea Is that the 
speaker-listener has at his disposal an abstract system of rules or principles 
referred to as the deep structure that allows him to generate and to understand 
an Indefinitely large set of sentences referred to as the surface structure. 
This distinction, drawn In linguistic theory, between deep and surface structure 
applies to our present concerns Ir two important respects. The first Is the 
transformational grammarian's view that deep structure Is far removed from sur- 
face structure; It Is argued t\at although the deep structure determines the sur- 
face strur ure It Is not manifested In the surface structure. The Importance of 
this view Is that It concurs with Bernstein's (1967) general conclusion In his 
classic analysis of the coordination and regulation of movement. Referring to 
the engram or motor-image of an act Bernstein comments: "The higher engram. 
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which may be called the engrain of a given topological class, is already structur* 
ally far removed from any resemblance whatever to the Joint-*mu8cle schemata...." 
(p. 49). The essence of Bernstein's (1967) view is that the central substrate 
for a pattern of movements is a representation of the environment. Following 
Evarts' (1967) work, Pribram (1971) has argued that the cortical representation 
can be thought of as "a 'mirror image' of the field of external forces" (p. 246). 
Thus, the underlying structure is best described as an Image-of-Achlevement since 
it encodes environmental contingencies. An interesting upshot of this view is 
that action and consequence, which prove to be inseparable conceptually, are also 
inseparable neurophysiologically . 

The second characteristic of the surface-deep structure distinction I wish 
to touch upon is that the child must come to determine the nature of the under* 
lying deep structure from a limited experience with surface structures. It .s 
assumed by Chomsky and his colleagues that the child essentially "looks through" 
the utterances she hears to the abstract form behind those utterances. The child 
is saidf therefore, to construct a theory of the regularities of her linguistic 
experience. Similarly, the child seeing capital A's must determine an abstract 
representation that will extend over an indefinitely large set of instantiations 
of that character. And, by the same token, our hypothetical child learning to 
write the letter A must determine from her limited experience with the set of A- 
writing movements a theory of how to write A. The abilities to recognize in- 
definitely various A's, and to write A in indefinitely various ways are based on 
representations that are abstract and generative, like the grammar Chomsky has 
in mind for language. We should not be surprised by this conclusion: there is 
no reason why the nervous system should not solve similar problems in similar 
ways. 

The mathematical group as an example of an abstract structure 

Clearly, the form of the representation that allows for the writing of A in 
novel ways is not motor. That is, it cannot be said to consist of programs of 
muscle innervation* In the same way, the abstract representation that affords 
the identification of novel A*s cannot be sensory, i*?., it cannot be described 
as any circumscribed set of sensory properties. We should note that the con- 
stancy function in the identification and in the writing of A reveals an indif-- 
ference of both modes to metrical variation, and suggests rather strongly a de- 
pendency of both modes on topological properties (Bernstein, 1967). Thus, common 
to all capital A's viewed and written is that they are members of a single topo- 
logical class, while the differences between capital A's, both viewed and writ- 
ten, would be determined by topological differences of a higher order (Bernstein, 
1967). 

On the foregoing considerations we should argue that the action concept of 
A and the perception concept of A cannot be represented, respectively, as a 
particular aggregate of motor elements and as a particular aggregate of sensory 
elements. Instead, they are more accurately viewed as injunctions or rules 
specifying how a set of elements should relate, whatever those elements might be. 

In attempting to account for constancy in visual perception several students 
of the problem have appealed to the mathematical concept of group (e.g., Cassirer, 
1944; Pitts and McCulloch, 1947). Essentially, a group is any set or collection 
of ell v^ts (and they need not be specified) which can be combined according to a 



74 



law such that auy combination of them produces an element belonging to the set 
Itself. The set, therefore. Is said to be self-contained or closed. 

More formally, a group may be defined as a set G together with a composition 
rule which generates for each pair a and b of elements of G a third element ab of 
G for which the following conditions hold: 

1. The composition rule Is associative: For any three elements a, 
b, c of G: (ab)c « a(bc), 

2. There exists an element 1 In G such that a«l « l*a « a. The 
element 1 Is known as the Identity element. 

3* To each element In G there corresponds an Inverse In G such 
that: a»a"^ « a"^-a « 1. 

If we now define a generic concept as a group of transformations (Casslrer, 
19AA), then It can be said that there Is a group G, which defines the action 
concept 6f A and another group g, which defines the perception concept of A* 
However, an Interesting property of groups is that two groups can be Isomorphic , 
that Is, they can represent the same abstract group . If the manner of Internal 
Interlocking of elements Is the same In both cases, even though the elements of 
the two groups may differ radically from one another in other respects. Thus, 
although the perception concept of A and the action concept of A would appear to 
differ because of the different elements with which they work (sersory proper- 
ties on the one hand and muscle contractions on the other) they may, in fact, be 
identical. The idea is that the two groups, G and g, which define the two con- 
cepts, have the same internal structure. This speculative conclusion can be 
stated more usefully as follows: the abstract structure that affords the identi - 
fication of optical instantiations of A also affords the production of motor 
instantiations of A . 

Conclusio n 

I have considered certain characteristics of a "simple" transcription task 
in order to argue that the problems that beset the perception theorist and those 
that beset the action theorist are very similar in nature, and thus similar in 
the principles needed for their solution* On a less general level I have spec- 
ulated that for this transcription task (and I suspect for others) perception and 
action maybe related chrou^n a common abstract structure indigenous to neither. 
And finally, I have expressed, implicitly, the view that the Theory of Action is 
as much a part of Cognitive Psychology as are the Theory of Perception and the 
Theory of Language. If you remain unconvinced of the abstract nature of the 
knowledge underlying so-called perceptual-motor activity, consider the following 
description of balancing on a bicycle presented by Michael Polanyi (1964). As 
the cyclist starts to fall to the right he turns the handlebars to the right, 
deflecting the bicycle along a curve to the right. The result of this maneuver 
is a centrifugal force pushing the cyclist to the left and offsetting the gravi- 
tational force pulling to the ground to the right. Consequently the cyclist is 
thrown out of balance to the left and responds by turning the handlebars to the 
left deflecting the bicycle along a curve to the left, which results in a centrif- 
ugal force pushing him to the right, etc., etc. In the course of these maneuvers 
the cyclist is obeying the following injunction: "adjust the curvature of the 
bicycle's path in proportion to the ratio of unbalance over the square of the 
speed." Keeping one's balance on a bicycle is a very cognitive act. 
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Reaction Times to Comparisons Within and Across Phonetic Categories: Evidence 
for Auditory and Phonetic Levels of Processing* 

David B. Plsonl^ and Jeffrey Tash^ 



Same-df 'ferent reaction times (RTs) were obtained In a Posner- 
type matching task to pairs of synthetic speech sotnds ranging per- 
ceptually from /ba/ through /pa/. Listeners were r "^quired to respond 
"same" if both stimuli in a pair were the same phonetic segments 
(i.e., /ba/ - /ba/ or /pa/- /pa/) or "different" if both stimuli were 
different phonetic segments (i.e., /ba/ - /pa/ or /pa/ - /ba/). RT 
for "same" matches was faster to pairs of acoustically identical stim- 
uli (A-A) than to pairs of acoustically different stimuli (A-a) be- 
longing to the same phonetld category. RT for "different" responses 
was slbver for a two-step difference across the phonetic boundary 
than for a four-step or six-step difference* These results provide 
evidence for distinct auditory and phonetic levels of processing in 
speech perception. Low- level acoustic Information about stop conso- 
nants may he available to listeners but this is dependent on the lev- 
el of proce-^sing accessed by the particular Information processing 
task employed. 

It is now well established that when listening in the speech mode a subject 
can identify the phonetic category of a sound but cannot discriminate between 
acoustically different sounds selected from within the same phonetic category 
(Llberman, Cooper, Shankweiler, and Studdert-Kennedy, 1967; Llberman, 1970; 
Plsonl, 1971, 1973). This phenomenon, known as "categorical perception," 
appears to be unique to certain classes of speech sounds. In the idealized 
case, two speech sounds can be discriminated only to the extent that they can 
be identified as different on an absolute basis (Studdert-Kennedy, Llberman, 
Harris, and Cooper, 1970). This contrasts with other kinds of auditory percep- 
tion where discrimination is better than absolute identification (Pollack, 1952, 
1953). 



*An earlier report of these findings was presented at the 85th meeting of the 
Acoustical Society of America, Boston, Mass., 11 April 1973. 

^Indiana University, Bloomlngton. 
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Auditory Information from the earliest stages of perceptual processing of 
speech sounds may be lost as a consequence of phonetic categorization. Thus, 
acoustic Information will be unavailable for use In a subsequent discrimination 
task (Plsoni, 1973). The complexity of speech sounds and the status they have 
as linguistic segments In language may force listeners to respond to these 
sounds In an absolute sense, transforming the sounds Into more durable phonetic 
representations. Since the discrimination tasks employed In most speech per- 
ception experiments place a heavy load on short-term memory. It Is reasonable to 
suppose that a phonetic representation would be stored in short-term memory In 
preference to the auditory transform of the complex acoustic signal. According- 
ly, the observed "categorical" discrimination performance may not actually be 
based on the specific acoustic properties of the stimuli, but rather, on a 
higher, more abstract phonetic level of analysis. It is possible that informa- 
tion from the earliest stages of processing might be available to a listener, 
at least for a short period of time. However, the extent to which this rela- 
tively unencoded, Io%#-level Information can be accessed will depend on a variety 
of different factors including the stage or stages of perceptual analysis exam- 
ined by a particular information processing task. 

The present study is concerned with how listeners go from one level of per- 
ceptual analysis to another in speech perception and with what type of Informa- 
.l^lon may remain of previous levels of analysis. Specifically, we were concerned 
with determining whether listeners could respond to acoustic differences between 
categorically perceived speech sounds or whether they can only process these 
sounds on an abstract phonetic basis. The procedure used to investigate this 
problem was the reaction time (RT) matching paradigm developed by Posner and his 
associates (Posner and Mitchell, 1967; Posner ^ 1969; Posner, Boles, Elchelman, 
and Taylor, 1969). This procedure provides an opportunity to examine the levels 
of analysis at which comparisons are made by measuring the processing time 
required for different types of comparisons. 

Thus, when a listener is asked to determine whether two speech sounds are 
the "same" or "different," the time to arrive at a decision may reflect the 
level of perceptual processing and in turn the type of information required for 
a comparison. Some speech sounds may be compared directly, based on their 
acoustical properties, %fhlle other stimuli may require a process of abstraction 
where Invariant features must first be identified before being compared (Posner 
and Mitchell, 1967; Posner, 1969). Classifying two acoustically different 
speech sounds as the "same" may be considered to Involve matching abstracted 
phonetic features at a higher level of perceptual analysis than classifying two 
acoustically identical stimuli as the "same." The latter comparison could be 
based on an earlier stage of analysis involving only the low-'level acoustic 
properties of the stimuli. 

Figure 1 shows a flowchart of a uodel of the stages of analysis Involved in 
this type of classification tr ik. This model is adapted from Posner and 
Mitchell's (1967) work on letter classification. 

On every trial a listener is presented with a pair of stimuli and is re- 
quired to determine whether the members of the pair are the "same" or ^Mlffer- 
ent*" Three types of stimulus pairs are shown at the top of the figure, A-A, 
A-a, and A-B. The A-A pairs represent acoustically identical palr& of stimuli. 
The A-a pairs represent acoustically different stimuli selected from within a 
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FLOWCHART OF CLASSIFICATION TASK 




Figure 1: Model of the stages of analysis ini^olved in the "same" - "different" 
classification task. 
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particular phonetic category. [In Posner*s (1969) terminology, these would be 
pairs of physically dlffeirent stimuli with the same "name" code.] Finally, the 
A-B pairs represent stimuli selected from dlffeirent phonetic categories. These 
are acoustically different and hav^ different names. 

Depending on the particular type of stimulus pair, various predictions can 
be made about the relative amount of time required for "same" matches and "dif- 
ferent" matches. For example, If low-level acoustic Information can be accessed 
for a comparison, reaction time should be faster for a "same" response when the 
Input pairs are acoustically Identical (e.g., A-A) than when they are acousti- 
cally different but phonetically the same (e.g., A-a). This should be true If 
the acoustically Identical pairs (e.g., A-A) could be matched as "same" at an 
earlier stage of analysis than the acoustically different pairs (e.g., A-a). 
The acoustically different pairs would require an additional stage of analysis 
for a "same" response^ However , If only an abstract phonetic representation Is 
used In the comparison, reaction times for a "same" match to these two types of 
pairs should be Identical. Under this assumption, a similar set of predictions 
can also be made for the "different" matches. If distinct auditory and phonetic 
levels of processing exist, pairs of stimuli with large physical differences 
should be matched as "different" faster than pairs of stimuli with smaller phys- 
ical differences. If only an abstract phonetic representation is available, re- 
action time for "different" matches should be equivalent, regardless of the mag- 
nitude of the physical differences between pairs of^ stimuli. 

METHOD 

Subjects 

The listeners were nine paid volunteers, all of whom were either graduate 
students or staff members associated with the Mathematical Psychology Program at 
Indiana University. The S^s were right-handed native^ speakers of English and 
reported no history of a hearing disorder or speech impediment. Ss were paid 
fcr their services at the rate of $1.50 per hour. All S^s had had some previous 
experience with synthetic speech stimuli, although they were naive to the exact 
purposes of the present experiment. 

Stimuli 

A set of bilabial stop consonant-vowel (CV) stimuli were synthesized on the 
parallel resonance synthesizer at Raskins Laboratories. The basic set of stim- 
uli consisted of seven thr^e-f ormant syllables 300 msec in duration. The stim- 
uli varied in 10-msec steps along the voice onset time (VOT) continuum from 0 
through +60 msec, which distinguishes /ba/ and /pa/. VOT has been defined as 
the interval between the release of the articulators and the onset of laryngeal 
pulsing or voicing (Lisker and Abramson, 1964). Synthesizer control parameter 
values for these stimuli were similar to those employed by Lisker and Abramson 
(1970) in their cross- language experiments. The final 230 msec of the CV 
syllable was a steady-state vowel appropriate for an English /a/. The frequen- 
cies of the first three formants were fixed at 769, 1,232, and 2,525 Hz 
respectively. During the initial 50-msec transitional period, thn first three 
formantf moved upward toward the steady-state frequencies of the vowel. For 
successive stimuli in the set, the delay in the rise of Fl to full amplitude 
(i.e., the degree of Fl "cutback") and in the switch of the excitation source 
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from hiss (aperiodic) to buzz (periodic) was Increased by 10 msec. Simultaneous 
changes In amplitude In the lover frequency region and type of excitation source 
have been shown to characterize the voicing and aspiration differences between 
/b/ and /p/ In English (Llsker and Abramson, 1967). 

Experimental Materials 

All stimuli were digitized and their wave forms stored on the Pulse Code 
Modulation System at Hasklns Laboratories (Cooper and Mattlngly, 1969). Two 
types of audio tapes were prepared under computer control: an Identification 
test and t\ matching test. A 1,000 Hz tone of 100 msec duration was recorded 
500 msec before the onset of each trial. This tone served as a warning signal 
for the S and was also used to trigger a computer Interrupt which Initiated 
timing response latency. 

Two different 140-trlal Identification tests were prepared. Each test con- 
tained 20 different randomizations of the seven stimuli. Stimuli were recorded 
singly with a 3*sec Interval between presentations. Each stimulus occurred 
equally often within each half of the tape. 

Four different **same" - 'different*' matching tests were constructed. Each 
test tape contained 48 pairs of stimuli. Half of all the trials consisted of 
within-category pairs requiring a "same" response while the other half consisted 
of across-category pairs requiring a '*dlf ferent*' response. Figure 2 shows the 
arrangement of the stimulus conditions employed in the present experiment. 

Wlthln-category pairs were either physically identical (A-A) or physically 
different (A-a) . A-A trials consisted of stimuli 1, 3, 5, and 7, each paired 
with itself (i.e., 1-1, 3-3, 5-5, 7-7). The A-a trials, which were separated 
by two steps along the continuum or +20 msec VOT, con^^isted of the stimulus 
pairs 1-3, 3-1, 5-7, and 7-5. 

Across-rategory pairs (A-B) , which were always physically different, were 
separated by two, four, or six steps along the continuum. These comparisons 
represented differences of +20, +40, or +60 msec VOT respectively. 

Each of the eight wlthln-category comparisons appeared three times within 
a block of 48 trials, whereas each of the six between-category comparisons 
appeared four times. The Interstlmulus Interval (ISI) between members of a pair 
was held constant at 250 msec. Successive trials were separated by 4 sec. 

Procedure 

The experimental tapes were reproduced on an Ampex AG-500 two-track tape 
recorder and were presented dlotlcally through Telephonies (TDH-39) matched and 
calibrated headphones. The gain of the tape recorder was adjusted to give a 
voltage across the earphones equivalent to 70 db SPL re 0.0002 dynes/cm^ for a 
1,000 Hz calibration tone. Measurements were made on a Hewlett Packard VTVM 
(Model 400) before the presentation of each experimental tape, ^s were run in- 
dividually in a small experimental room. All responses and reaction times were 
recorded automatically under the control of a PDP-8 computer located with the 
tape recorder in an adjacent room. 
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The instructions for the Identification test were similar to those used In 
previous speech perception experiments. Ss were required to identify each 
stimulus as either /ba/ or /pa/ and to respond as rapidly as possible. The S^s 
responded to each stimulus by pressing one of fwo labeled telegraph keys. For 
a given S^, one hand was always used for a /ba/ response while the other hand was 
used for a /pa/ response. The two keys were counterbalanced for hands across 
S^s. 

For the matching task, Ss were told that Lhey would hear a pair of stimuli 
on every trial and their task was to decide whether the two stimuli were the 
"same** or **dlf f erent** phonetic segments. The type of inistructions employed here 
is similar to the **name match** instructions employed by Posner and Mitchell 
'1967). S^s were told that half of all the pairs were tae same (e.g., /ba/ - 
/ba/ or /pa/ - /pa/) and half of the pairs were different (e.g., /ba/ - /pa/ or 
/pa/ - /ba/). S^s were encouraged to respond as rapidly as possible. As in the 
identification task, S^s responded to each pair by pressing one of two telegraph 
keys, labeled **same** and **dif ferent.** The response keys were also counterbal- 
anced^ for hands across S^s. 

<« 

S^s were tested for an hour a day on two consecutive days. Each session 
began with a 140-item Identification test which was followed after u short break 
by two 48-trial matching tests. Since the first day served as a practice 
session, only the identification and matching data from the second session will 
be considered In the remainder of this report. 

RESULTS AND DISCUSSION 

Identification Task 

The average identification function is shown in Figure 3 along with the 
mean RT for identification. Each point represents the mean of 180 responses 
over the nine S^s. 

The filled squares and open circles show percent /ba/ or /pa/ response re- 
spectively to each of the seven stimuli in the continuum. The filled triangles 
represent the corresponding latency of identification response to each stimulus. 
Examination of Figure 3 indicates that the identification function is <^ite con- 
sistent. S^s partitioned the 'itimulus continuum into two discrete phonetic' seg- 
ments. The phonetic boundary or cross-over point in identification is at about 
+30 msec VOT which is consistent with previous findings (Llsker and Abramson, 
1967). 

Inspection of the RT function during Identlf icaclon shows that Ss are slow- 
est for stimulus 4 which is at the phonetic boundary and fastest for the other 
stimuli which are within phonetic categories. These results are also consistent 
with the findings reported by other investigators who have studied reaction time 
in the identification of synthetic speech sounds (Studdert-Kennedy, Llberman, 
and Stevens, 1963). Reaction time is a positive function of uncertainty, in- 
creasing at the phonetic boundary where identification is least consistent and 
decreasing where identification is most consistent. In anticipation of the dis- 
crimination tests, it is noted that identification time is slowest for the stim- 
ulus region where discrimination is best. 
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Matching Task 



The major results of the "same" - "different" classification task are shown 
In Figure 4. 

The mean RT for each of the two types of "same" trials (A-A, A-a) Is based 
on a total 216 judgments, while the mean RT f^r each of the three types of 
"different" pairs Is based on 144 judgments averaged over nine S^s. 

An examination of the "same" responses reveals that subjects are faster for 
pairs of acoustically Identical stimuli (e.g., A-A) than for pairs of acousti- 
cally different stimuli (e.g., A*-a) which have been selected from within a 
phonetic category. The 41 msec difference between these two conditions Is 
highly significant (P<.01) by a correlated t-test, t(8) « 3.20 (one-tailed). 
This result Is consistent with the model described earlier. S^s can access low- 
level acoustic Information even though they have categorized these pairs of 
&tlmull as the "same" phonetic segments. Thus, "same" matches to acoustically 
identical speech sounds are presumably based on an earlier stage of perceptual 
analysis than "same" matches to acoustically different speech sounds. In the 
lattar case, the "same" response Is based on a comparison of the phonetic fea- 
tures of each stimulus which must have been extracted before a match could have 
been made. It may be assumed that the abstraction of phonetic features from 
the acoustic signal requires an additional amount of processing time. This Is 
presumably responsible for the difference In RT between the two wlthln-category 
conditions. 

The present findings, based on "same" responses to wlthln-phonetlc category 
comparisons Indicate that even perception of stop consonants may not be entirely 
categorical, as prevlo;.sly supposed. Rather, the degree of categorical percep- 
tion will depend upon the extent to which low-level acoustic Information can be 
utilized within the experimental task. Since acoustic Information not only 
decays rapidly* over time but also Is highly vulnerable to various types of In- 
terfering stimuli, the specific discrimination procedure may be crucial In de- 
termining the relative roles of acoustic and phonetic Information In speech 
sound discrimination. For example, the ABX procedure may force the listener to 
rely almost entirely on phonetic Information In discrimination because of the 
arrangement of stimuli In this procedure. 

One additional point should be emphasized here concerning the wlthln-pho- 
netlc category companlscns. It could be argued that. In the present experiment, 
Ss were responding to these stimuli as Isolated acoustic events rather than as 
speech sounds. If so^ the proportion of "same" responses should have been quite 
different for A-a pairs and A-A pairs. In fact, P( • same • | A-a) and P(*same'| 
A-A) were almost Identical, suggesting that Ss were responding to these stimuli 
as speech sounds. 

The mean RTs for "different" responses to the three types of across-cate- 
gory pairs are also shown in figure 4. An examination of these RTs provides 
ajddltlonal support for the argument that S^s can employ relatively low-lev$!l 
acoustic information in the comparison process. RT ,for "different" matches is 
slower for a two-step difference than for a four-step or six-sLep difference 
across the phonetic boundary. Both differences are significant (p<.005) 
by correlated t-tests, t(8) « 4.95; t(8) » 4.82, respectively. These findings 
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suggest that "different" matches may not be based solely on an abstract phonetic 
representation. Rather, a "different" response to pairs of stimuli across cate- 
gory boundaries may also be oased on low-level acoustic Information at an 
earlxer stage of perceptual analysis* Pairs of stimuli which are separated by 
large physical differences In VOT, such as 1-7 and 2-6, c&a be differentiated 
solely on the basis of their acoustic dissimilarity. Stimulus pairs separated 
by small differences In VOT, such as 1-3 and 3-5, cannot be differentiated on 
the basis of their acoustic similarity and an additional stage of analysis is 
required. Since an initial decision cannot be made reliably on the basis of 
acoustic information alone, the "different" decision for pair 3-5 must, there- 
fore, be based on a comparison of the phonetic features of the two stimuli. 

In summary, the results suggest that low-level acoustical information about 
a speech stimulus may be available to listeners along with a more abstract pho- 
netic representation, even in the case of stop consonants. Presumably the ex- 
tent to which low-level infomidtion can be accessed will depend not only on the 
particular level of perceptual analysis examined but also on the type of infor- 
mation processing task employed. 

The results of this experiment argiie f ji a diversity of experimental tasks 
in the study of speech sound perception. On the basis of the distribution of 
responses alone, we might conclude that only a categorical or phonetic analysis 
is available for stop consonants. The addition of the RT task reveals another 
level of analysis. A view of speech sound perception entailing a series of 
interrelated stages of analysis could serve as the framework for determlnlnr. 
quantitatively th3 ways in which speech perception may Involve specialized 
mechanibms for perceptual analysis. Moreover, such an approach may help to 
determine the ways in which various speech perception phenomena may conform to 
more general perceptual processes. 
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The Role of Auditory Short-Term Memory In Vowel Perception* 
David B. Plsonl^ 



ABSTRACT 

The distinction between categorical and continuous modes of 
speech perception has played an Important role In recent theoretical 
accounts of the speech perception process. Certain classes of speech 
sounds such as stop consonants are usually perceived In a categorical 
or phonetic mode. Listeners can discriminate betireen two sounds only 
to the extent that they have Identified those stimuli as different 
phonetic segments. Recently, several findings have suggested that 
vowels, which are usually perceived In a continuous mode, may also be 
perceived In a categorlcal-llke mode, although this outcome may be 
dependent upon various experimental manipulations. This paper re- 
ports three experiments that examined the role of auditory short-term 
memory In the discrimination of brief, 50 msec vowels and longer, 300 
msec vowels. Although vowels may be perceived In a categorlcal-llke 
mode, differences still exist In perception between stop consonants 
and steady-state vowels. The findings are discussed with regard to 
auditory and phonetic coding In short-term memory. 

A basic assumption underlying recent theoretical work In speech perception 
has been that the perception of speech sounds Involves processes and mechanisms 
that are somehow basically different from the processes Involved In the percep* 
tlon of other auditory stimuli. One line of evidence cited In support of this 
view concerns the Identification and discrimination of various classes of syn- 
thetic speecli sounds (see Llberman, Cooper, Shankweller, and Studdert*Kennedy, 
1967). Some classes of speech sounds, such as stop consonants, have been found 



*A short report of some of these findings was presented at the 85th meeting of 
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to be perceived in a categorical mode; listeners can discriminate between two 
acoustically different stop consonants only to the extent that they can identify 
the stimuli as different on an absolute basis (Liberman, Harri.» Hoffman, and 
Griffith, 1957; Liberman, Harris, Kinney, and Lane, 1961; Mattlngly, Liberman, 
Syrdal, and Halves, 1971; Pisoni, 1971). In contrast, other classes of speech 
sounds such as steady-state vowels have been found to be perceived in a contin- 
uous mode; listeners can discriminate among many more vowels than would be ex- 
pected on the basis of absolute identification alone (Fry, Abramson, Eimas, *and 
Liberman, 1962; Stevens, Liberman, Studdert-Kenncdy , and Ohman, 1969; Pisoni, 
1971). 

Differences in perception between stop consonants and steady- otate vowels 
have m^t^only played a central role in theoretical accounts of speech perception 
(jee/Liberman et al., 1967; Liberman, Mattlngly, and Turvey, 1972; Studdert- 
Kennedy, 1973), but have also been implicated in several recent studies dealing 
with immediate recall of these two classes of speech sounds^ For example, 
Crowd^r (1971, 1973a, 1973b) has reported that for lists of stop-consonant vowel 
syllables presented auditorily, a recency effect is observed in immediate serial 
recall if the syllables in the list contrast only on vowels (e.g., /bi/, /ba/, 
/bu/); however, the recency effect is curiously absent if the syllables contrast 
only on the stop consonants (e.g., /ba/, /da/, /ga/). The .recency effect de- 
scribes an advantage in recall of the last serial position over the second-to- 
last serial position in a list of items. 

Crowuer (1971, 1973a, 1973b) also reports two other differences in immedi- 
ate memory for stop-consonant vowel stimuli: a modality effect and a suffix 
effect. The modality effect refers to the advantage of auditory over visual 
presentation for recall of items from later serial positions of a list. This 
modality effect ha:^ been observed for vowels but not for consonants. On the 
other hand, the suffix effect refers to a decrease in performance for items at 
the end of a list when a redundant word is presented after the last item in that 
list. The suffix effect has also been found with vowels but not with stop con- 
sonants. All three findings — the recency effect, the modality effect, and the 
suffix effect — are characteristic of a form of auditory memory called precategor- 
ical acoustic storage (PAS) by Crowder and Morton (1969). They argue that this 
form of memory holds some relatively unanalyzed representation of an acoustic'' 
stimulub' for approximately 2 sec. This form of "sensory" memory has been dis- 
cussed recently by Massaro (1972a. 1972b) and it should be distinguished from 
his preperceptual auditory stor^ which holds acoustic information for a much 
shorter period of time (i.e., .^50 msec) and which has distinctly different pro- 
perties. 

In the original study by Ciowder (1971), information about the vowel and 
consonant was confounded by their position within the syllables. The stop con- 
sonants were in initial position in the syllable and the vowels were in final 
position. Simi^^r results, however, have been reported by Cole (1973), who found 
that consonants show less of a recency effect than vowels, regardless of the 
initial or final position in the syllables of the critical to-be-remembered in- 
formation (e.g., /ba/ vs. /ab/). Crowder (1973a) recently replicated these re- 
sults in a study which controlled for position of the information within the 
syllable. Both Cro'-'der (1971, 1973a) and Cole (1973) have explained the differ- 
ences in recency effects for consonants and vowels in terms of differences in 
auditory memcry for these two classes of speech sounds. Thus, they assume that 
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the recency effect for the vowels Is due to retrieval of some auditory represen- 
tation for vowels from a sensory memory store such as Crowder and Morton's PAS 
system. Crowder (1971, 1973a, 1973b) has been somewhat more specific and further 
suggests that auditory inforaation in vowels, but not in stop consonants, is rep- 
resented in PAS. 

Crowder (1971, 1973a, 1973b), Liberraan et al. (1972), and Cole (1973) have 
all noted the parallel between the differences in perception of stop consonants 
and vowels (the categorical vs. continuous distinction) and the differences in 
serial recall for these two types of stimulus vocabularies. These investigators 
have suggested that the differences in immediate recall may in fact be due to 
differences in perceptual processing for these two classes of speech sounds. 
For example, in discussing these results Liberman et al. (1972) s:ate: 

... the difference in recency effect between the stops and 
vowels is exactly what we would expect.... the special pro- 
cess that decodes the stops strips away all auditory infor- 
mation and presents to immediate perception a categorical 
linguistic event the listener can be aware of only as (b,d, 
g,p,t,k). Thus, there is for these segments no auditory, 
precategorical form that is available to consciousness for 
a time long enough to produce a recency effect. The rela- 
tively unencoded vowels, on the other hand, are capable of 
being perceived in a different way. Perception is more 
nearly continuous than categorical.... the auditor/ charac- 
teristics of the signal can be preserved for a while 
(p. 329). 

The position described by Liberman et al. (1972) appears to be a reasonable 
explanation of the results showing differences in serial recall betwecii cc .te- 
nants and vowels. We take these findings as being generally consistent with the 
assumption that the differences are perceptual in nature, presumably occurring 
at a relatively early stage of perceptual analysis. However, toany of the per- 
ceptual studies dealing with the identification and discrimination of consonants 
and vowels have not been very specific about where the differences between these 
two classes of sounds occur during perceptual processing. In addition, although 
one might want to argue that the recall findings are due to differences in per- 
ceptual processing for consonants and vowels, some recent findings seem to indi- 
cate that vowels may also be perceived categorically, much like stop consonants. 
If vowels are perceived categorically in the same way and ')y the same mechanisms 
as stop consonants, we are clearly faced with somewhat of a dilemma in trying to 
account for the serial recall data by reference back to the perceptual findings. 
One way to deal with this problem would be to demonstrate that the categorical 
perception findings for the vowels are both qualitatively and quantitatively 
different from those obtained for the stop consonants. 

In several previous reports, Pisoni (1971, 1973) has suggested that the 
major differences in discrimination between stop consonants and steady-state 
vowels are to be found in an examination of within-phonetic category comparisons. 
Discrimination performance for the putative categorically perceived vowels is well 
above chance within phonetic categories, suggesting an auditory as well as pho- 
netic basis for a discrimination decision. The situation for the stop consonants 
is quite different. Under identical experimental conditions, subjects apparently 
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cannot retrieve the auditory information needed for a within-phonetic category 
decision with the consonants (Pisoni, 1973). This paper describes a revised 
model of the perceptual processes involved in the ABX test based on Fujisaki and 
Kawashima (1970) and then reports a series of experiment*? dealing with the dis- 
crimination of steady-state vowels. The major purpose of these studies was to 
make explicit some of the differences between the type of categorical-like per- 
ception recently observed with vowels and the type of categorical perception 
typically observed with stop consonants. 

Auditory and Phonetic Memory Codes 

Since Fujisaki and Kawashima* s (1970) findings on categorical-like percep- 
tion of vowels are central to a number of theore ical efforts in speech percep- 
tion (Pisoni, 1973; Studdert-Kennedy, 1973), we consider some of their results 
and a modified version of their original model of the perceptual processes in- 
volved in the ABX discrimination test. 

Fujisaki and Kawashima (1968, 1969, 1970) proposed a two-stage model of 
categorical perception, a model based on a distinction between auditory and pho- 
netic information in short-term memory (STM) . The model assumes that differences 
in discrimination between classes of speech sounds are due to the degree to which 
auditory and phonetic information is employed in the deci^^on process in discrim- 
ination. Although not explicitly described by Fujisaki and Kawashima, we assume, 
following Studdert-Kennedy (1973), that auditory information is coded in short- 
term memory subsequent to an analysis of the acoustic waveform into a set of 
time-varying psychological dimensions such as pitch, loudness, and tiiubre. Simi- 
larly, we assume that phonetic information is coded as abstract phonetic features 
in STM after the "auditory" dimensions have made contact with some type of repre- 
sentation generated from synthesis rules residing in long-term memory (LTM) . 

The basic model proposed by Fujisaki and Kawashima (1969, 1970) is shown 
with several additions and modifications^ in Figure 1. As shown, the model ap- 
plies to discrimination exclusively within ♦-.he ABX-.discrimination format, but 
the same assumptions could be adapted to other discrimination procedures. 



We assume that the encoding of speech sounds involves information about both the 
phonetic features of the stimulus and the auditory properties of the acoustic 
input. Furthermore, auditory inform::tlon at relatively early stages of process- 
ing may be lost more rapidly from STS :han the higher order phonetic informa- 
tion. According to this view, both an auditory and a phonetic representation 
are present in STS; the comparison process in ABX discrimination entails the re- 
trieval of either the auditory trace or the phonetic code for a correct deci- 
sion. One consequence of this view would be that if a S must base a decision on 
auditory information, he would be more likely to show better performance in an 
ABB tiiad than an ABA triad. This should be true for two reasons. First, the 
two stimuli in the ABB triad ar^ closer together in time. Second, there is no 
interfering acoustic event in the comparison (retroactive) interv^^^i as is the 
cas^ with the ABA triad. Gianzman and Pisoni (1973) examined the^e two type& of 
comparisons in their data and found exactly these results for bct.i stop conso- 
nant ABX discrimination (along the VOT continjum) and vowel ABX iscrimination. 
v'rowder (1973b) has recently alluded to this same observation. 
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According to this model, when a listener is required to discriminate between two 
different phonetic types, the decision In the discrimination task is based on 
phonetic information coded in STM. These derived phonetic properties or features 
of the auditory stimuli reside in phonetic short-term store. In this case, the 
listener determines whether the first two stimuli (l*e., A and B) are different 
phonetic segments. Since A and B are different phonetic segments, the listener's 
decision about X is based exclusively on a comparison of the phonetic information 
coded in STM* Thus,, he compares X with B and X with A and then determines which 
is the closest match. 

However, the situation is somewhat different when the listener is required 
to discriminate between two identical phonetic types; that is, two that are 
acoustically different but that have been drawn from the same phonetic category. 
Now the listener must rely exclusively on the auditory information for each stim- 
ulus coded in STM. In order to arrive at a correct decision in the discrimina- 
tion task, the listener must retr eve and compare with stimulus X the auditory 
representations of the two stimuli in auditory short-term store, since the two 
stimuli, A and B, were not originally identified as different phonetic segments. 
The listener must make a comparative judgment based on auditory information of 
the acoustic properties of these stimuli rather than ah absolute Judgment based 
on the phonetic features. 

The basic model first developed by Fujisaki and Kawashima (1969, 1970) and 
expanded here predicts that categorical perception is related to the degree to 
which auditory and phonetic information in STM ^an be employed in th$ decision 
process during ABX discrimination* It has been reported that the major differ- 
ences in discrimination between stop consonants and steady-state voWels appear to 
be related to differences in retrieval of auditory rather than phonetic informa- 
tion from STM (Fujisaki and Kawashima, 1970; Pisoni, 1971, 1973; Plsonl and 
Lazarus, 1973). But the extent to which auditory and phonetic information is en- 
coded and later retrieved from STM will depend on a number of factors — for 
example, the duration of the critical information in the signal; the acoustic en- 
vironment or context of the cues; whether the acoustic cues are steady-state or 
transient; and the particular information-processing task. All these factors 
should presumably influence the way auditory and phonetic information is used by 
the decision rules in discrimination. 

The experiments reported in this paper are concerned with three related 
questions about vowel discrimination and the role of auditory STM in speech per- 
ception. First, what effect does duration play in vowel discrimination? 
Fujisaki and Kawashima (1970) found that isolated steady-state vowels of very 
brief duration (50 msec) tend to be perceived in .a categorical-like mode; there 
was a peak across the phonetic boundary and a trough within phonetic categories. 
However, although Fujisaki and Kawashima showed that perception of vowels was 
more nearly categorical at short durations, they did not employ stimuli with 
durations comparable to. the earlier vowel perception studies conducted at 
Haskins Laboratories (Fry et al., 1962; Stevens et al., 1969). It is possible 
that the longer vowels of 300 msec also may be perceived in a somewhat categor- 
ical-like mode. 

The second question deals with the effect of context. What role does the 
ims^ediately surrounding acoustic environment have on vowel discrimination? Stop 
consonants alvrays occur In syllabic context. Moreover, there is a relatively 
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complex relation between perceived phonetic segment and Its representation In 
the acoustic signal; the essential acoustic cue for the stop consonants Is a 
rapidly changing spectrum (Fi and F2 transitions) uhlch Is both short In duration 
(SO msec) and transient In nature (Llbermsnt Delattre» Cooper, and Gerstman, 
1954). In contrast, the major acoustic cue for vowels, the frequencies of the 
first three formants, has a relatively long duration and Is relatively uniform 
over the entire length of the stimulus (Delattre, Llberman, Cooper, and Gerstman, 
1952). Stevens (1968), Sachs (1969), and Fujlsafcl and Kawashlma (1969, 1970) 
have all found that vowels tend to be perceived more categorically when they 
appear In a fixed context than when the same stimuli are presented In Isolation. 
Fujlsakl and Kawashlma suggested that the cont«^xt served as a "perc(eptual refer- 
ence*' or anchor. However, It could be argued that the context selectively Inter* 
fered with retention of the auditory Information In target vowels. If this 
Interference hypothesis Is correct, vowel discrimination should be poorer when a 
reference context follows a target vowel (retroactive Interference) than when It 
precedes It (proactive Interference) . Ue assume that the retroactive context 
acts to Interrupt' the processing of the target vowel as well as more generally 
to mask ici^insltlonal Information In a stop-^consonant vowel syllable. In addi- 
tion, the amount of Interference should be related to the similarity of the tar- 
get sound and context. For example, vowels should suffer more interference from 
other vowels than from tones or white noise (Darwin, 1971) • 

Finally, the third question deals with the ABX discrimination test that 
Fujlsakl and Kawashlma (1969, 1970) employed In their experiments on vowels. Is 
the categorlcal-llke discrimination found with vowels in the ABX test also found 
more generally with other discrimination procedures (Pisoni, 1971)? We consider 
discrimination performance with vowels in another test procedure, the 4IAX test 
of a paired similarity, which was introduced in another riiport (Pisoni, 1971). 
If there are large differences between the ABX and the AIAX test for both short, 
50 msec vowels, and longer, 300 msec vowels, we will have additional evidence for 
suspecting that the tjrpe of categorical perception observed for vowels is somehow 
both qualitatively and quantitatively different from that observed for the stop 
consonants « 

EXPERIMENT I 

In this experiment we compare discrimination of short (50 msec) vowels with 
longer (300 msec) vowels. The major aim of the study was to replicate and ex- 
tend the findings of Fujlsakl and Kawashlma (1970) and Pisoni (1971), who re- 
ported that vowels are perceived as more nearly categorical at short durations. 

Method 

Materials 

Stimuli . Two sets of seven steadyrstate vowels were synthesized on the 
vocal tract analogue synthesizer at the Research Laboratory of Electronics, 
Massachusetts Institute of Technology. Table 1 provides the frequencies of the 
first three formants for both sets of vowels. The fourth and fifth formants 
were fixed at 3500 Hz and 4500 Hz respectively. The seven stimuli were arranged 
so ^.hat the first three formants varied in equal logarithmic steps from the 
English vowels /I/ through /I/. The formant frequencies chosen were identical to 
those used by Stevens et al. (1969) in their cross-language study of vowel per- 
ception. 
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TABLE' 1: Formant Frequencies for vowel stimuli. 



stimulus Number 


Formant Frequency 


(Hz; 
^3 




270 


2300 


3019 


2 


285 


2262 


2960 




298 


2226 


2902 




315 


2180 


2836 




336 


2ia 


2776 




353 


2103 


2719 




374 


2070 


2666 



One set of stimuli had a 8teady*state duration of 300 msec (the equivalent 
of approximately 30 glottal pulses) with a rise and decay time of 50 msec. The 
second set of stimuli had a steady-state duration of 50 msec (the equivalent of 
five glottal pulses) with a rise and decay time of 10 msec. Both sets of stimuli 
had identical formant frequency values and had a falling fundamental frequency. 
Fq fell linearly from 125 Hz to 80 Hz for the long vowels and from 125 Hz to 100 Hz 
for the short vowels. Bandwidths of the first three formants were fixed at 50 
Hz» 80 Hz» and 110 Hz respectively. The stimuli were originally recorded on 
magnetic tape at MIT and then digitized on the PCM system at Haskins Laboratories 
where the waveforms were stored on disc for test preparation* These stimuli are 
identical to those used by Pisohi (1971). 

Experimental tapes . All the experimental tapes were produced under computer 
control from the digital values of these stimuli. A 1000 Hz tone was placed at 
the beginning of each tape to insure that the playback levels would be uniform 
throughout the testing sessions. 

Four different 70-item identification test sequences wer^e_4>Tepared for each 
set of vowel stimuli. Each identification test contained ten different random* 
Izations of an entire set of seven stimuli. The stimuli were recorded singly 
with a 4 sec Interval between presentations and an 8 sec Interval after every 
ten trials* 

Four different 88*trial ABX discrimination tapes were also constructed for 
each set of stimuli. All possible one* and two-step comparisons of the seven 
stimuli appeared twice within each tape. The tapes were balanced^ with the re* 
striction that each ABX triad occur equally often within each half of the test. 
Stimuli within each triad were separated by 1 8ec» while successive triads were 
separated by 4 sec. There was an 8 sec pause after every ten trials. 
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Subjects 



Eighteen undergraduate students at Indiana University served as S,s. They 
were obtained from the Psychology Department's subject pool and received either 
1 hour course credit or $1.50 for each session. All S^s were rlght*handed native 
speakers of English with no history of a hearing or speech disorder. None of 
the Ss had heard any synthetic speech before the present experiment. 

Procedure 

The experimental tapes were reproduced on a high quality tape recorder 
(Ampex AG-500) and were presented binaurally through Telephonies (TDH-39) matched 
and calibrated headphones. The gain of the tape recorder playback was adjusted 
to give a voltage across the earphones equivalent to 70 db SPL re 0.0002 dyn/cm 
for a 1000 Hz calibration tone. To compensate for the differences in loudness 
between the 300 msec and 30 msec vowels due to stimulus duration, the gain for 
the calibration tone on the 50 msec vowel tapes was adjusted by means of decade 
attenuators to be <f8 db above the 300 msec vowels. Measurements were made on a 
VTVM (Hewlett-Packard Model 1031) before presentation of each experimental tape. 
S^s were run in two counterbalanced groups of nine S^s each. They were tested in a 
small experimental classroom. All ^s in a given session heard the same stimuli 
in the same order. 

The E read aloud to rhe Ss a set of Instructions which explained the nature 
of the experiment. S^s also had a set of printed instructions before them. Ss 
were told that this was an experiment dealing with speech perception. For the 
identification tests, S^b were required to identify each stimulus as either the 
vowel /I/ as in "beet" or /I/ as in "bi^t." In the ABX discrimination tests Ss 
were told that the stimuli would be arranged in groups of three and that their 
task was to decide whether the third sound was more like the first sound or the 
second sound, ^s were told to gueas if they were not sure, but to respond on 
every trial. Judgments were recorded in prepared response booklets. 

^s were run for an hour a day on four consecutive days. On the first two 
days one group received the 300 msec vowels and the other group received the 
30 msec vowels. The conditions were reversed for each group on the last two 
days. An identification test for a given stimulus condition was always followed 
Innedlately by the corresponding ABX discrimination tests. When the data are 
combined over the four sessions, each provided 40 identification responses to 
each of the 7 stimuli in both the 300 and 30 msec vowel conditions. Each ^ also 
provided 32 judgments for each of the AB discrimination comparisons in each stim- 
ulus condition. 

Results 

The probabilities of identification averaged over the 18 ^s for each stim- 
ulus condition are given in Table 2. The identification probabilities for both 
stimulus conditions are quite sharp and consistent. Examination of Table 2 
shows that the probabilities of identification for the two vowel conditions are 
very nearly exact complements of each other. There is a slight shift in cross- 
over point or phonetic boundary between /i/ and /I/ as stimulus duration is re- 
duced from 300 msec to 30 msec; the boundary shifts predictably in favor of the 
short, lax vowel /I/. 
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TABLE Z: Probabilities of Identification averaged over 
18 SB for 300 msec and 50 msec stimuli. 







300 msec Vowels 








1 


Stimulus Number 
2 3 A 5 


6 


7 




.998 


.997 .968 .635 .141 


.033 


.018 


/I/ 


.002 


.003 .032 .365 .859 


.967 


.982 








50 msec Vowels 








I 


Stimulus Number 
2 3 4 5 


6 


7 


/i/ 


.980 


.f79 .871 .419 .071 


.025 


.021 


/I/ 


.020 


.021 .129 .581 .929 


.975 


.979 



The average one- and two-step obtained ABX discrimination functions are 
shown In Figure 2 for both stimulus conditions. The predicted discrimination 
functions, which were derived from the Identification probabilities according to 
the Hasklns Laboratories* model (Llberman et al., 1957; Pollack and Plsonl, 1971) 
are also plotted In Figure 2. The predicted functions represent what would be 
expected under th^. strong categorical perception assumption: that discrimination 
Is no better than absolute Identification. 

The obtained discrimination functions for both vowel conditions show peaks 
at the phonetic boundary and troughs irlthln phonetic categories. Analysis of 
variance Indicates that discrimination performance Is significantly better on 
the 300 msec vowels than on the SO msec vowels, F(l,16) « 9.59, p < .01, but only 
for the one-step comparisons. This finding Is consistent with Fujlsakl and 
Kawashlma (1970) and Plsonl C1971). The two-step obtained discrimination func- 
tions did not differ significantly from each other. There was a significant 
difference between obtained and predicted discrimination scores for both the one- 
and two-step comparisons, F(l,16) - 77.27, p < .001 and F(l,16) - 343.12, p < .001, 
respectively. 

We may obtain a better quantitative idea of these results by comparing the 
obtained discrimination functions to those predicted from the model of categor- 
ical perception. We assume that in the ideal case of categorical perception 
there will be an exact mapping of the discrimination functions predicted from 
identification and the functions obtained in ABX discrimination. Although an 
exact mapping is rarely found, since the obtained functions are usually higher 
than the predicted, we can use this assumption to our advantage for comparative 
purposes. 
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We assume that the difference between the obtained and predicted discrim- 
ination functions for a given condition represents a measure of the degree to 
which that particular condition deviates from the predictions of the idealized 
model. Hence, it follows that the smaller the discrepancy between the obtained 
and predicted functions, the closer the obtained discrimination function will be 
to the categorical model. If short vowels are more categorical than longer 
vowels, we would expect a smaller difference between the obtained and predicted 
functions for the SO msec condition than for the 300 msec condition. The ana- 
lyses reported below were carried out by first calculating difference scores on 
the obtained and predicted data for each and then performing separate analyses 
of variance on the one- and two-step scores. Of grejttest interest is the com- 
parison between the long and short vowel conditions* 

Analysis of variance on the one-step difference scores revealed a signifi- 
cant effect for stimulus duration, F(l,16) « 12.80, p < .005. The difference 
between the obtained and predicted scores was greater for the longer vowels than 
for the shorter vowels. There vas also a significant main effect for stimulus 
comparison, F(5,80) « 3.68, p < .Ol* None of the interactions reached statisti- 
cal significance. 

f 

A similar analysis on the two-step data failed to find a significant dif- 
ference for the main effect of vowel duration, although the stimulus comparison 
did reach significance again, F(4,64) » 9.44, p < .001. In addition, the vowel 
duration by stimulus comparison interaction was significant, F(4,64) * 3.79, 
p < .01. 

Discussion 

The results of this experiment seem to indicate that vowels of both long 
and short duration may be perceived in a categorical-like mode. Differences in 
discrimination, as they are related to stimulus duration, are revealed only in 
the one-step comparisons. This finding is consistent with the results reported 
by Fujisakl and Kawashima (1970) and Pisonl (1971). Although there was no over- 
all effect of vowel duration for the two-step data, differences restricted to 
particular types of stimulus comparisons along the continuum did occur* These 
results appear to be due to the apparent difference in the location of the 
phonetic boundary between /i/ and /I/ under the two conditions of duration* 
Since Fujisakl and Kawashima (1970) employed only one-step stimulus comparisons, 
the present two-step data have little bearing on their results or conclusions in 
this regard* 

The major outcome of this experiment may appear to be somewhat at variance 
with previous studies of vowel discrimination, particularly the original vowel 
perception studies conducted by investigators at Haskins Laboratories (Fry et al*« 
1962; Stevens et al*, 1969)* In these studies, vowel discrimination was de- 
scribed as more nearly continuous* than categorical. However, Stevens et al* 
(1969) did find some evidence for peaks in the discrimination functions which 
were correlated with changes in identification, but the troughs in the discrim- 
ination functions were well above chance when contrasted with the discrimination 
data tjrpically found with the stop consonants* Although the discrimination 
functions, particularly the two-step dat&„ appear by Inspection to be categori- 
cal, we note that performance within phonetic categories is in fact well above 
chance* An auditory, nonphonetlc basis for discrimination is available to the 
listener. 
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One of the major weaknesses of the original Hasklns* model of categorical 
perception is its failure to account for vithin-category discrimination perfor-^ 
mance that may be at a level well above chance. In the model (Llberman et al.» 
1957; Llberman et al., 1961; Pollack and Plsoni, 1971) it was assumed explicitly 
that if a listener identifies two stimuli as the same he can discriminate them 
only by chance. 

In the model developed by Fujisaki and Kawashima (1970), performance that 
is above chance on within-category comparisons is assumed to reflect the under- 
lying contribution of auditory short-term memory to ABX discrimination* Thus, 
Fujisaki and Kawashima do not assume that discrimination is at chance within 
categories, but rather that the level of within-category performance reflects the 
contribution of auditory short-term memory to discrimination. 

Two assumptions are implicit in the model shown in Figure 1. First» dis- 
crimination will be based on phonetic informatlan if the first two members of an 
ABX triad (A and B) are Judged by the listener to be different phonetic segments. 
Second, discrimination will be based on auditory short-term memory if the first 
two members of an ABX triad have been Judged to be the same phonetic segments. 

Following Fujisaki and Kawashima, the predicted correct ABX discrimination 
score may be expressed by the following two components: 

^ABX " ^A^B ^ ^A-B 

" ^Af^B ^ "a-B • ^A-B 

where C . « the probability that a correct discrimination occurs on the basis 
of phonetic identification. 

the probability that a correct discrimination occurs on the basis 
of auditory short-term memory. 

the probability that stimuli A and B are identified as the same 
phonetic segments. ^' '> 

M^^^ > the conditional probability that a correct discrimination takes 
place when A and B are identified as the same phonetic se^ents. 
This quantity indicates the degree to which Judgments are based on 
auditory short-term memory and is equal to the asymptotic value of 
^ABX extremes of the stimulus range (i.e., wlthin-category 

comparisons) . 

These components are related according to the following equations: 
C^^g . 1/2 I(P^ . P2) + P^(l - P2) + Pjd - ?{)] 

^A-B - ^ <1 - V <1 - P2>1 • ^A-B 

P^and ?2 represent the probabilities rhat stimuli A and B in the triad are iden- 
tified as the same phonetic segments In an absolute identification test. 

A new set of predicted ABX discrimination scores was obtained from the 
iL.del outlined above. Figure 3 shows the obtained one- and two-step discrimina- 
tion functions along with the new predicted functions derived from Fujisaki and 
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Kavashlna* 8 model. Examination of this figure Indicates that the'^iev predicted 
discrimination functions match the obtained functions much more closely than the 
traditional Hasklns* predictions. However, It should be pointed out that the 
better fit of the obtained data Is to be expected since one parameter from the^ 
obtained data has been used In the predictions. The simplicity and advantage 
of the Hasklns* model lies In the fact that no additional assumptions or data 
are required to predict discrimination performance under the strong categorical 
assumption. 

It Is possible that the results of this experiment were somehow due to the 
particular test procedures used. Although there was no significant main effect 
for order of presentation nor any Interactions In the raw score analysis of 
variance, there appeared to be some slight differences In discrimination of 
short vowels depending upon the order of presentation over the experimental ses- 
sions. Discrimination of the short vowels was bette^r If these stimuli were pre- 
sented on the last two days of the experiment than If they were presented on the 
first two days. 

An additional experiment was run to examine the possibility that test order 
effects might be responsible for some of the differences. Seven new ^s were 
obtained and run In two completely randomized and counterbalanced groups; there 
i^re four ^s In one group and three In the^^other. The same experimental tapes 
and procedures were used. The only difference Introduced was that ^s received 
both short and long vowels on each day of testing, with the order reversed for 
each group. 

Figure 4 shows the one- and two-step discrimination functions for short and 
long vowels. These resultfs are basically quite similar to those obtained In the 
main study. An analysis c,f variance on the differences between the obtained and 
predicted discrimination scores was performed. The results Indicated that neither 
the main effects (I.e., vowel duration and stimulus comparison) nor any of the 
Interactions were significant. Thus, the differences between long and short 
vowels for the one-step data found In the previous experiment do not occur when 
possible order effects are controlled across testing sessions. 

Inspection of Figure 4 reveals that discrimination Is also somewhat categor* 
leal. The one-step obtained functions seem to map onto the predicted Hasklns* 
functions reasonably well, although they differ systematically by a constant. 
The two-step obtained discrimination functions are quite similar to those of the 
larger experiment. They also differ from the predicted scores. 

Using Fujlsakl and Kawashlma's model, predicted discrimination functions 
were also calculated for these data. Figure 5 shows these new functions along 
with the Hasklns' predictions. Again, a better fit Is obtained by accounting 
for the contribution of auditory short-term memory to discrimination. 

To summarize, these expeilments have shown categorical-like discrimination 
functions for both short (SO m&ec) vcmela and longer (300 msec) vowels. Although 
there %9ere peaks in the discrimination functions » the level of wlthin-category 
discrimination was well above chance expectation. When the contribution of audi* 
tory short- cerm memory is Included in the predicted discrimination functions, 
according to Fujlsakl and Kawashima*8 model, relatively better fits are obtained 
for the observed discrimination scores for both vowel conditions. These results 
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suggest two conclusions. First, the role of stimulus duration taken alon 
appears to have relatively little effect on the shape of the discrimination 
functions. This is in agreement with an earlier observation reported by Pisoni 
(1971). Second, the type of categorical perception observed with these vowels 
is basically different from that observed in previous studies with the stop con- 
sonants. We suggest that the peaks and troughs in discrimination observed in 
the present study are primarily due to the nature of the ABX test procedure. 
The arrangement of stimuli in this test format may prevent listeners from re- 
trieving the auditory information needed for discrimination and subsequently 
may force listeners to rely more heavily on phonetic coding in short-term memory. 

EXPERIMENT II 

This experi^nent is concerned with interference effects in vowel discrimina- 
tion. Stevens (1968), Sachs (1969), and Fujisaki and Kawashima (1970) have re- 
ported that vowels in fixed contexts are perceived more categorically than the 
same vowels in isolation. Fujisaki and Kawashima (1970) suggested that the added 
context served as a ''perceptual anchor" or reference. However , the context could 
act to interfere selectively with t;he retention of both auditory and phonetic 
information. If th^ perceptual anchor hypothesis is correct, it should be of 
little consequence where the reference context is placed (i.e^, before or after 
the target vowel). However, if the context does selectively interfere with the 
encoding and retention of information, then temporal position of the reference or 
interference sound should show differential effects on discrimination. In addi- 
tion, interference should be related to the similarity of the context and target 
vowels. 

Method 

Materials 

Stimuli. The 50 msec short vowel continuum from the first experiment was 
used as the basic stimulus set. Four types of interfering stimuli were then con- 
structed and used as contexts for each of the original seven stimuli. The inter- 
fering stimuli were 50 msec in duration and equal in overall intensity to the 
original vowels. The stimuli consisted of the following; (1) a 1000 Hz pure 
tone, (2) a burst of white noise, (3) the vowel /a/, and (4) the vowel /c/. Each 
type of interfering context either preceded the target vowel (proactive inter- 
ference condition) or followed the target vowel (retroactive interference condi*- 
tion). The original set of seven vowels was also presented alone as a control, 
and will be referred to as the silent condition. 

Experimental tapes . Two types of identification and discrimination tests 
were prepared for each of the four cypes of interfering contexts. Two different 
70-item identification tests were prepared for each of the proactive and retro- 
active interference conditions. In addition, four different 88.item ABX discrim- 
ination tapes were also constructed for each of these conditions. The identifi- 
cation and discrimination tapes for the silent condition were the same as those 
used in the previous two experiments. The test orders and timing sequences par- 
alleled the test orders described in Experiment I. 
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Subjects 



Twenty undergraduate students served as Ss. They were paid at the rate of 
$2.00 per hour for their services and met the same requirements as those S^s used 
in the previous experiments. 

Procedure 

The procedure was similar to that used in the previous experiment except 
for the following differences. Ss were run in four separate groups of five Ss 
each. Each group received one of the four interference conditions. Under a 
given condition, S^s received identification and discrimination tests for the 
silent (control) condition, and for the proactive and retroactive interference 
conditions. 

The instructions were the same as those used previously, except that when 
S^s were run under proactive or retroactive interference conditions they were 
told to ignore the interfering sound and to try to concentrate on only the target 
vowels, /i/ and /I/. 

S^s were run for one and a half hours a day on two consecutive days. On each 
day S^s received the silent vowel condition first, followed then by the proactive 
and retroactive conditions in differing order. Before each ABX discrimination 
test, S^s received the corresponding identification test: silent, retroactive, or 
proactive condition. 

Results and Discussion 

Table 3 shows the average one- and two-step percent correct discrimination 
scores for the silent, proactive, and retroactive context conditions under each 
of the four types of interference. These scores have been summed over the stim- 
ulus comparisons. Discrimination performance is generally lowest in the retro- 
active condition and highest in the silent condition for each type of interfer- 
ence. Analyses of variance were performed separately on the one- and two-step 
scores. The main effect for context position was significant for both analyses: 
F(2,24) « 5.98, p < .01 for the one-step scores and F(2,24) » 23.65, p < .001 
for the two-s^ep scores. Although the main effect for type of interference did 
not reach significance in either analysis, there was a significant interaction 
between type of interference and context position for the two-step data, 
F(6,24) « 3.76, p < .01. 

The m'^jor results of this study are predicted by the interference assump- 
tion: there is jiQve retroactive interference than proactive interference in 
vowel discrimination. Moreover^ as shown in Table 3, there is more retroactive 
interference for a more similar vowel (e.g., /e/) than a less similar vowel 
(e.g., /a/) for targets /i/ and /l^. The interaction between context position 
and type of interference may also be due in part to the relatively better per- 
formance for the tone condition in all context positions. This result is not 
entirely surprising since we would expect tonal stimuli to have little effect on 
the initial encoding process for the target vowels used here. 

To summarize, this study provides evidence for interference effects in the 
discrimination of vowels in the ABX test paradigm. These effects seem to be 
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greater when the interfering co text follows a target vowel than when It pre- 
cedes the vowel. Moreover, the similarity of context and target vowel may be 
related to some Interference with the initial encoding process when both auditory 
and phonetic inforiaation is registered in short-term store. 

The results of this study have several implications. First, these findings 
argue against Fujisaki and Kawashima's (1970) general "perceptual anchor" hypoth- 
esis because they indicate relatively specific temporal relations and similarity 
effects in discrimination. Second, these results may be generalized to the case 
of stop consonant syllables. It is possible that the extended duration of the 
vowel in a stop consonant syllable may act as a backward masking stimulus and 
preserve the integrity of the syllable as a perceptual unit (Massaro, 1972a; 
McNeil and Repp, 1973). 

EXPERIMENT III 

In thi{t experiment we compare vowels of both short and long duration under 
two discrimination procedures; the traditional ABX test and the 4IAX test of 
paired similarity. If the categorical-like discrimination observed with these 
vowels in Experiment I is due mainly to the nature of the ABX test, we should 
expect to fjnd differences between these two types of discrimination procedures. 
Moreover, slace the differences in vowel discrimination appear to be due primari- 
ly to the availability of auditory information, we anticipate advantages in dis- 
criminatlor: to reveal themselves on within- rather than between- phonetic cate- 
gory comparisons. 

Figure 6 shows the arrangement of stimuli in the traditional ABX^ test and the 
4IAX test of paired similarity. In the ABX test, pairs of stimuli are arranged in 
triads; the first two stimuli are always different, the third stimulus is identi- 
cal with either the first (A) or the second stimulus (B) . This disc.: imlnat ion 
procedure requires that the subject encode and store each of the thcee stimuli 
over a Relatively long time (e.g., several seconds) before arriving at a decision. 

In the 4IAX test, two pairs of stimuli are presented on every trial; one 
pair is always the same and one pair is always different. The S^s* task is to 
determine which pair contains the same stimuli, the first pair or the second 
pair. We assume that the 4IAX is more sensitive to purely auditory information 
since a decision can be made on a pair-wise comparison. The first two stimuli 
are compared and a difference, d^, is calculated and stored in short-term memory* 
The second pair of stimuli are compared and their difference, d2, is also calcu- 
lated and stored. A final decision may be obtained when the two differences are 
later recalled and compared. 

Method 

Materials 

Stimuli . The 50 msec short vowel continuum and the 300 msec long vowel con- 
tinuum from Experiment I were used. 

Experimental tapes . The same Identification tapes and ABX discrimination 
tapes from Experiment I were also used here. In addition, a new set of discrimin- 
ation tapes were prepared in 4IAX format for both vowel conditions. All possible 
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DISCRIMINATION TESTS 

ABX TEST - PAIRS OF STIMULI ARRANGED IN TRIADS: 
ABA. BAB. ABB. BAA 
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QUESTION: IS THE THIRD STIMULUS MORE UKE THE 

FIRST OR SECOND STIMULUS ? 
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4IAX TEST " TWO PAIRS OF STIMULI ARE PRESENTED 
ON EACH TRIAL. ONE PAIR IS THE SAME AND ONE 
PAIR IS DIFFERENT: A-A— A-B. A-B— A-A. 
A-A— B-A, ETC. 
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Figure 6: Details of the two dlscriinination procedures; the standard ABX test 
and the 4IAX test of paired similarity. 
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one- and two-step comparisons of the seven stimuli In each continuum were em- 
ployed an4 arranged In the following AlAX sequences: AA-AB, AA-BA, AB-AA, and 
BA'AA. Four different 88-ltem discrimination tapes were produced under computer 
control* The stimuli within each pair were separated by 150 msec, and stimulus 
pairs were separated by one sec. Successive trials were separated by five sec. 
After every ten trials there was an extra ten-sec pause. 

Subjects 

Fourteen undergraduate students served as S,s. They were either paid for 
their services or received the equivalent in credit hours for their participa- 
tion as part of a course requirement. They met the same requirements as the S^s 
used in the previous experiments. 

Procedure 

The fourteen S^s were run in two igroups of seven S^s each. One group was 
assigned to the long (300 msec) vowel condition; the other group was assigned to 
the short (50 msec) vowel condition. Thus, vowel duration was a between-S^s 
variable and the discrimination test type was w.ithln-S^s variable. 

On each day, first received the standard identification test for a given 
vowel condition. This was followed by both types of discrimination tests. Four 
S^s in each group received the discrimination tests in one order while the other 
three Ss were presented with the reverse arrangement. 

The instructions for identification and^ ABX discrimination were Identical 
to those used in Experiment I. For the 4lAX discrimination test, S^s were told 
that they would hear two pairs of sound on each trial and that their task was to 
determine which pair sounded more similar, either the first pair or the second 
pair.* 

Results and Discussion 

Tabl^ 4 shows the average probabilities of identification for each stimulus 
condition. The data are averaged over the 7 S^s in each vowel condition. These 
data are almost identical to the probabilities obtained in the first experiment. 

Figure 7 shows the obtained discrimination functions for ABX and 4IAX dis- 
crimination for the two vowel conditions. Inspection of this figure reveals 
relatively large differences in discrimination between the two types of test pro- 
cedures. Performance is much better at every stimulus comparison for the 4IAX 
test than for the ABX test. This is true for both vowel conditions, although the 
effects are most noticeable for the long, 300 msec vowels. The difference be- 
tween the two discrimination tests was highly significant for both the one- and 
two-step comparisons, F(l,12> ■ 36.10, p < .001; and F(l,12) « 21.85, p < .001, 
respectively. The main effect of vowel duration did not reach significance in 
either the one- or two-step analysis. 

The most interesting result, however, is the interaction between type of 
discrimination test and stimulus comparison along the continuum. Both the one- 
step and two-step interactions were significant, F(5,60) " 3.79, p < .01 and 
F(4,48) » 3,76, p < .01. This result, taken together with the main effect of 
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TABLE 4: Probabilities of identification averaged over 





14 S^s 


for 300 msec and fiO msec 


stimuli. 






300 msec Vowels 








1 


Stimulus Number 
2 3 4 5 


6 


7 


/!/ 


1.000 


1.000 .967 .681 .157 


.005 


.010 


Ill 


.000 


.000 .033 .319 .843 


.995 


.990 








50 msec Vowels 








1 


Stimulus Number 
2 3 4 5 


6 


7 


/!/ 


.967 


.971 .824 .338 .119 


.043 


.029 


Ill 


.033 


.029 .176 .662 .881 


.957 


.971 



test type suggests that discrimination performance is not only better in the 4IAX 
test format but also that the shapes of the two discrimination functions are 
quite different. This result may be seen most clearly in the two-step discrimin- 
ation functions for the 300 msec vowels. A very distinct advantage of the 4IAX 
test over the ABX test for ui thin-phonetic category comparisons may be seen in 
this data. Discrimination in the 4lAX test may be thought of as more nearly con- 
tinuous than categorical with these stimuli. 

We conclude that the advantage in discrimination is due to the retrieval of 
auditory information. As noted earlier, the ABX test forces S^s to rely more ex- 
tensively on phonetic rather than auditory coding in STM. Moreover, these re- 
sults suggest that the categorical-like discrimination observed in Experiment I 
for both long and short vowels was probably due almost exclusively to the par- 
ticular constraints of the ABX test rather than to some inherent property of the 
stimuli or to a limitation on the sensory capacities of the S^s. 

More generally, it would appear that the form of categorical discrimination 
observed with the vowels is in fact different from that observed with the stop 
consonants. Comparable manipulations of the experimental procedures (i.e., use 
of the 4lAX test) with a stop consonant continuum have thus far failed to show 
equivalent changes in either the overall level or the shape of the discrimination 
functions (Pisoni, 1971). Figure 8, taken from a recent paper by Pisoni and 
Lazarus (1973), shows the results obtained under ABX and 4IAX discrimination with 
a synthetic consonant continuum ranging in voice onset time from /ba/ through 
/pa/. These consonant data were collected under the same experimental conditions 
as for the vowel data in the present study, ^s first took an absolute identifi- 
cation test and then received either an ABX test or a 4lAX test. The discrimina- 
tion functions show some slight advantage in favor of the 4IAX test, but overall 
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the obtained functions still match the predicted ones fairly well. We do not 
preclude the possibility that acJltory information can be employed in consonant 
discrimination; rather, we assume that auditory Information from the earliest 
stages of processing tends to be lost from STM much more rapidly thpn phonetic 
information. As a result, decisions that require phonetic coding will be more 
accurate and reliable than decisions that require a comparlsrn of auditory infor- 
mation in STM. 

GENEBAL DISCUSSION 

The experiments reported in this paper have been concerned w^ith the role of 
auditory short-term memory in vowel perception and more generally with the rela- 
tionship between auditory and phonetic coding in speech perception. The main 
findings of these studies indicate that vowels of both short (50 msec) and longer 
(300 msec) duration appear to be discriminated in a categorical-like mode; there 
is a peak in the ABX discrimination functions for stimulus comparisons selected 
from different phonetic categories and a trough in these discrimination functions 
for comparisons selected from within the same phonetic category. The role of 
stimulus duration per se was shown to play a relatively minor role in contribut- 
ing to the shape and level of the discrimination functions. The categorical-like 
discrimination for the vowels was assumed to reflect the greater dependence on 
phonetic rather than auditory coding in the AfiX format* Support for this con- 
clusion was obtained in two additional experiments. One study demonstrated 
specific temporal and similarity Interference effects in ABX discrimination; the 
other study showed that vowel discrimination could be substantially Improved 
when auditory infoimation in STM is made more readily available for use in dis- 
crimination. 

A major issue in speech perception has been the distinction between categor- 
ical and continuous modes of processing as reflected in the differences in dis- 
crimination between consonants and vowels. Despite several recent findings, we 
conclude that meaningful and theoretically important differences still exist be- 
tween consonants and vowels. Moreover, we suggest that differences between cate- 
gorical and continuous discrimination are primarily due to a failure of retrieval 
of auditory information in STS. The earliest stages of auditory processing of 
speech sounds tend to be lost from subsequent processing. Loss of this inforjia- 
tion may be due to both Interference from succeeding acoustic events and to the 
decay of auditory information over time. 

REFERENCES 

Cole, R. A. (1973) Different memory functions for consonants and vowels. Cog. 
Psychol. 4, 39-54* 

Crowder, R. G. (1971) The sound of vowels and consonants in immediate memory. 

J* Verbal Learn. Verbal Behav. 10, 587-596. 
Crowder, R. G. (1973a) Representation of speech sounds in precategorlcal 

acoustic storage. J. Exp. Psychol. 98, 14--24. 
Crowder, R. G. (1973b) Precategorlcal acoustic storage for vowels of short and 

long duration. Percept. Psychophys. IJ, 502-506. 
Crowder, R. G. and J* Morton. (1969) Precategorlcal acoustic storage (PAS). 

Percept. Psychophys. 5, 365-373. 
Darwin, C. J. (197?) Dlchotic backward masking of complex sounds. Quart. J. 

Exp* Psychol. 23, 386-392. 



115 



Delattre» P. C, A. M. Llberman, F. S. Cooper, and L. J. Gerstman. (1952) 

Observations on one- and tvo-formant vowels synthesized from spectrographic 
patterns. Word 8, 195-210. 

Fry, C. B., A. S. Abramson, P. D. Eimas, and A. M. Liberman. (1962) The identi- 
fication and discrimination of synthetic vowels. Lang. Speech _5, 171-189. 

Fujisaki, H. and T. Kawashima. (1968) The influence of various factors on the 
identification and discrimination of synthetic speech sounds. Reports of 
the 6th International Congress on Acoustics , Tokyo, August. 

Fujisaki, H. and T. Kawashima. (1969) On the modes and mechanisms of speech 
perception. Annual Report of the Engineering Research Institute, Faculty 
of Engineering, University of Tokyo 28, 67-73. 

Fujisaki, H. and T. Kawashima. (1970) Some experiments on speech perception 

and a model for the perceptual mechanism. Annual Report of the Engineering 
Research Institute, Faculty of Engineering, University of Tokyo 29, 207-214. 

Glanzman, D. L. and D. B« Pisoni. (1973) Decision processes in speech discrim- 
ination as revealed by confidence ratings. Paper presented at the 85th 
meeting of the Acoustical Society of America, Boston, Mass., April. 

Liberman, A. N. , F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy • 
(1967) Perception of the speech code. Psychol. Rev. M^, 431-461. 

Liberman, A. M. , P. C. Delattre, F. S. Cooper, and L. J. Gerstman. (1954) The 
role of consonant-vowel transitions in the perception of the stop and nasal 
consonants. Psychol. Monogr. 68 . 

Liberman, A. M. , K. S. Harris, H. S. Hoffman, and B. C. Griffith. (1957) The 
discrimination of speech sounds within and across phoneme boundaries. 
J. Exp. Psychol. 54, 358-368. 

Liberman, A. N., K. S. Harris, J. Kinney, and H. Lane. (1961) The 'discrimina- 
tion of relative onset-time of the components of certain speech and non- 
speech patterns. J. Exp. Psychol. 61^ 379-388. 

Liberman, A. N. , I. G. Mattingly, and M. T. Turvey. (1972) Language codes and 
memory codes. In Coding Processes in Human Memory , ed. by A. W. Melton and 
E. Martin. (New York: V. H. Winston). 

Massaro, D. W. (1972a) Preperceptual images, processing time, and perceptual 
units in auditory perception. Psychol. Rev. 29.^ 124-145. 

Massaro, D. W. (1972b) Preperceptual and synthesized auditory storage. 

Studies in Human Information Processing, Wisconsin Mathematical Psychology 
Program, University of Wisconsin, Madison 72-1 ^ 

Mattingly, I. G. , A. M. Liberman, A. K. Syrdal, and T. Halwes. (1971) Discrim- 
ination in speech and nonspeech modes. Cog. Psychol. 2^, 131-157. 

McNeill, D. and B. Repp. (1973) Internal processes in speech perception. 
J. Acoust. Soc. Amer. 53, 1320-1326. 

Pisoni, D. B. (1971) On the nature of categorical perception of speech sounds. 
Unpublished Ph.D. thesis. University of Michigan. (Issued as Supplement to 
Haskins Laboratories Status Report on Speech Research.) 

Pisoni, D. B. (1973) Auditory and phonetic memory codes in the discrimination 
of consonants and vowels. Percept. Psychophys. 13, 253-260. 

Pisoni, D. B. and J. H. Lazarus. (1973) Categorical and noncategorical modes 
of speech perception along the voicing continuum. Unpublished manuscript. 

Pollack, I. and D. B. Pisoni. (1971) On the comparison between identification 
and discrimination tests in speech perception. Psychon« Sci. 24, 299-300. 

Sachs, R. M. (1969) ^owel identification and discrimination in isolation vs. 
word context. Quarterly Progress Report, Research Laboratory of Electron- 
ics, Massachusetts Institute of Technology, Cambridge, Mass. No. 93 , 220- 
229. 



116 



Stevens, K. N. (1968) On the relations between speech movements and speech 

perception. Zeltschrlft fur Phonetlk, Sprachvlssenschaft, und Kommunlka- 

tlonsforschung 21 » 102-106. 
Stevens, K. N.» A. M. Llberman, M. Studdert-Kennedy, and S. E. G. Ohman. (1969) 

Cross- language study of vo%iel perception. Lang. Speech 12^, 1-23. 
Studdert-Kennedy, M. (1973) The perception of speech. In Current Trends In 

Linguistics^ Vol^ XII , ed. by T. A. Sebeok. (The Hague: Mouton). 



V, 



ERIC 




Effects of Amplitude Variation on an Auditory Rivalry Task: Implications 
Concerning the Mechanism of Perceptual Asymmetries* 

Susan Brady-Wood^ and Donald Shankweller^ 
Hasklns Laboratories, New Havent Conn. 



Right-ear superiority in perception of dichotically presented words was 
Interpreted by Kimura (1961), who discovered the phenomenon, and most subsequent 
workers as a manifestation of the specialization of the left cerebral hemisphere 
for language. This interpretation is supported by the finding of reversed ear 
asymmetry — left-ear superiority — in persons kno%m on other grounds to have 
atypical, right-hemisphere speech representation. Work at Hasklns Laboratories 
has sho%m that the right-ear advantage may be obtained with nonsense syllables 
that contrast in only one consonant segment. Since a right-ear advantage is not 
obtained from Just any noise made by the human vocal tract, nor from acoustic 
fragments of speech sounds isolated from the syllable, it was concluded that 
even separate speech sounds are perceived by the dominant hemisphere because of 
their linguistic functions. 

Right-ear superiority, as we understand it, requires that two conditions be 
met: 1) the stimulus material must require at some stage left-hemisphere pro- 
cessing (in general it has been found that nonspeech sounds do not); 2) the left 
ear's signal must undergo degradation in neural transmission to the left cere- 
bral cortex which makes it less likely to be processed than the signal arriving 
at the right ear. Although direct evidence is lacking that the crossed auditory 
pathways are physiologically stronger in man, electrophysiological work on cat 
shows that each ear commands more neural units in the opposite cerebral hemi- 
sphere. Thus we hypothesize with Kimura that the left-ear signal's disadvantage 
is related to the fact that the crossed connections from ear to brain prevail 
over the uncrossed; thus the right ear's connection with the speech-dominant 
left hemisphere is a privileged one. Right-ear superiority, then, is attributed 
jointly to an advantage in transmission of signals conveyed directly by the 
crossed auditory pathway and to lateral specialization of portions of the left 
cerebral cortex for some aspect (s) of the speech process. 



^Presented at the 85th meeting of the Acoustical Society of America, Boston, 
Mass., April 1973, under the title: Effects of Attenuation of One of Two 
Channels on Perception of Opposing Pairs of Nonsense Syllables when Monotically 
and Dichotically Presented. 

^Also University of Connecticut, Storrs. 
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If the right-ear advantage can in part be attributed to the quantitative 
superiority of the crossed transmission line, then the ear advantage should be 
sensitive to manipulation In a regular and consistent way by varying the Inten- 
sity of the two Input signals. If this can be demonstrated, It may permit us to 
Isolate, In a fut\.rf> experiment, the laterallzed cortical, proces8*-related com- 
ponent of the ear advantage from the transmission component reflecting varia- 
tions in the efficiency of the crossed patiiU^^y. These two components conceiv- 
ably vary Independently, and they are confounded in usual measures of the ear 
advantage* Partialllng out the effect due to transmission would clarify the 
interpretation of differences in the magnitude of the ear advantage for differ- 
ent phonetic classes and among different individuals for a given stimulus class. 

METHOD AWo PROCEDURE 

In this experiment, we varied the intensity of the signal on one channel 
while holding the other signal at a constant level. Figure 1 shows the two con- 
ditions of competing stimulation which we compare here: the monaural or mono tic, 
in which two synchronous signals are electrically mixed and then presented to 
one ear, and the dlchotlc condition, in which a different signal is presented to 
each ear. 

The syllables were the six stop consonants with the vowel /ae/. The stim- 
uli were prepared on the Haskins Parallel-Resonance Formant Synthesizer. They 
were digitized, edited for amplitude level by a computer-assisted routine, and 
output into a test order containing synchronous pairs of random combinations of 
syllables. The overall amplitude was attenuated in 5*-db steps from a reference 
level, which was chosen to be a comfortable listening level of about 70 db SPL. 
On each trial pair, the subject heard one syllable at the refererxe level and 
the other attenuated by 5, 10, 15, 20, or 25 db. The subjects, who were under- 
graduate psychology students, were Informed of the stimulus set and given prac- 
tice in identifying the syllables. They were asked to write down two different 
consonants for each trial, listing them in order of confidence. The data we 
present are based on the first response Judgments. 

RESULTS AND DISCUSSION 

The data from the monotlc experiment provide a baseline for evaluation of 
the dlchotlc condition. In Figure 2, the zero point represents percent correct 
identifications with no attenuation of either channel. The points to the left 
of the zero point indicate the percent correct identification of the stimuli on 
the attenuated channel; the points to the right of zero indicate the percent 
correct on the unattenuated channel for varying degrees of attenuation of the 
second channel. Each point is based on 30 Judgments for each of 12 subjects. 
T^n db of difference in gain produced a nearly as3rmpt;^tlc change in identifica- 
tion performance. As expected, it is a matter of indifference whether we pre- 
sent these competing signals to the right ear or to the left ear. 

Twenty-one subjects participated in the dlchotlc experiment in which twice 
as many (60) judgments per data point were collected. Figure 3 gives the plot 
of the data averaged for the 17 subjects who showed right-ear superiority at 
equal amplitude. Here the axes are the same: the plot shows the percent iden- 
tification of stimuli presented to each ear at each level of attenuation rela- 
tive to the opposite ear's signal. The effect of amplitude variation on perfor- 
mance is much less steep than was observed in the monotlc condition. Thus, when 
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the locus of competition Is central, a given difference In amplitude biases per- 
ception far less than when the signals Interact peripherally. The maximum am- 
plitude dlffetence of 25 db does not produce an asymptotic decrement In perfor- 
mance. Results obtained by Stafford (1971) at the Kresge Laboratory (New 
Orleans) place asymptote at between 40 and 50 db. It will be noted that the 
functions are parallel. The right-ear advantage remains at a constant 4 db 
relative to the left ear when It Is attenuated by the same amount. 

The 4 db difference ;>ppears as the cross-over point In Figure 4, which Is 
simply the same data replotted to show the mean percent by ear for all trials of 
a given type. For example, the pair of points on the extreme right represents 
performance for those trials on which the left ear was attenuated by 25 db rela- 
tive to the right, and the corresponding points on the extreme left gives per- 
frrmance for the reverse situation. The cross-over point, about 4 db from zero, 
shows the ear advantage which was displayed In Figure 3 as the difference be- 
tween the parallel lines. This point varied for Individual subjects from 1 to 
14 db. There Is a significant correlation of .80 between cross-over point and 
the degree of right-ear advantage when the signals are matched for amplitude. 
Variations In cross-over point reflect variations In left-ear gain required just 
to cancel the right-ear advantage. From this we may Infer that Individual 
differences In the magnitude of the right-ear advantage reflect. In part, dif- 
ferences In relative efficiency of the two transmission routes to the speech 
processor. 

A further purpose of this experiment was to compare the effects of ampli- 
tude differences on perception of double stimuli with the effects of varying the 
relative time of onset. Drawing on earlier data for this comparison, we find 
similar effects of amplitude and time differences in the monotlc case, different 
effects in the dlchotlc case. 

A study by Studdert-Kennedy, Shankweller, and Schulman (1970) Introduced 
temporal onset asynchronles of 10 to 120 msec. With monotlc presentation cf 
stop-vowel syllables, temporally staggered by these amounts, a function very 
similar to that shown in Figure 2 was obtained. As is characteristic of periph- 
eral masking, the advantage goes to the leading stimulus and the gain in perfor- 
mance as lead time increases is steep. In the dlchotlc case, the results were 
very different, but they are not parallel to those obtained when amplitude is 
varied dichotically. Interaural amplitude changes, as is seen in Figure 3, pro- 
duce a linear effect on identification. By contrast, the effects of interaural 
differences in time of arrival are highly nonlinear. The lagging syllable has 
the edge in competition with the leading one; i.e., the masking effect is back- 
ward. Thus, in contrast to the findings shown in Figure 3, the plot giving 
identification as a function of onset asynchrony Is asymmetric; performance 
changes more rapidly with lag than with lead.. We may infer that Interaural time 
differences affect the inputs after they have converged at the terminal cortical 
processor; Interaural amplitude differences, on the other hand, affect the 
signals prior to entry to the cortical processor. 

In concluding, we note that similar effects have been reported in studies 
investigating the parameters of visual masking of letters and words. Turvey 
(1973) found in studies employing a patterned mask that intensity and time manip- 
ulations produce different effects. In monoptlc masking of a letter target by a 
pattern mask the relative intensities of target and mask are critical. By con- 
trast, in the dichoptlc situation — where interaction of the signals occurs only 
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at a central level — Intensity differences are of little Importance* We have 
shown a parallel difference la the effects of signal amplitude in monotic and 
dichotic listening, as is seen from a comparison of Figures 2 and 3. Finallyt 
for both the auditory and visual modalities, the temporal direction of the cen** 
tral component of masking is asymmetric, being chiefly in the backward direction 
processing of the first item is Interrupted by a more recent input. 

To summarize the principal findings of the present study, it was shown that 
when one signed is presented at a fixed amplitude and the competing signal is 
varied in 5 db stet.s, the function relating Identification to the degree of 
attenuation is lir.ear (for amplitude differences at least as great as 25 db) and 
relatively flat In the dichotic case* From the effects of attenuation on the 
ear advantage, it was found that variations among individuals in the magnitude 
of the right-ear superiority are in part determined by factors related to trans- 
mission of the auditory signal prior to cortical processing. This, we hope, 
will make possii>le: the development of ways to isolate the two components of the 
ear advantage. 
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Digit-Span Memory In Language-Bound and Stimulus-Bound Subjects* 
Ruth S. Day"*" 

Hasklns Laboratories, New Haven* Conn. 



Ordinarily individuals fall Into a normal distribution In perception and 
memory experiments. That is, when we plot the number of subjects who achieve 
hlgh» medium, and low scores In a task, we find that most fall In the middle 
range, while some fall at the high and low ends. Such a distribution occurs for 
a wide variety of scores, Including for example, percent correct Identification 
and number of trials to criterion. 

DICHOTIC FUSION STUDIES 

Phonological fusion tasks In dlchotlc listening consistently do not yield a 
normal distribution (Day, 1969). A series of tasks Is given to the same sub- 
jects, using the same dlchotlc tapes In each task. All Items are of the general 
form BANKET/LANKET, where the Inputs to each ear differ only In their Initial 
phoneme. Furthermore, these Initial elements can be fused Into a cluster in 
English. Thus BANKET/LANKET — »^LANKET.^ 

Identification task . Subjects are asked to report 'what they hear' on 
every trial, be it "one word or two.. .a real word or a nonsense word/' Often 
subjects report hearing BLANKET, which is a fusion response: the /hi and /I/ 
are sent to different ears yet are perceived as a fused cluster. Figure 1 shows 
the frequency distribution cf fusion scores for a typical group of subjects; 
fusion scores are expressed as the proportion of trials on which each subject 
fused. Clearly, the distribution is bimodal, with high fusers and low fusers at 
either end and a marked absence of subjects in the middle range. Although 
Figure 1 shows data for only 16 subjects, the blmodallty has occurred over hun- 
dreds of subjects in subsequent experiments. 

Temporal order judgment (by phoneme) . One could argue that the identifica- 
tion task emphasizes processing at the word level. However, elsewhere (Day, 
1968) it has been shown that fusions also occur when both Inputs are acceptable 
English words (e.g., liACK/LACK — »-BLACK) , as well as when the fusion is a nonword 



*Paper presented at the 85th meeting of the Acoustical Society of America, 
Boston, Kass., 11 April 1973. 

"^Also Yale University, New Haven, Conn. 

^Hie arrow should be read "yields*" 
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(e.g., GORIGIN/LORIGIN — KJLORIGIN) . Nevertheless, another test is needed, a 
test which de-emphasizes combining all the information into a single response. A 
temporal order judgment task meets this requirement (Day, 1969). The same sub- 
jects listen to the same fusible dichotic items, but this time they need to re- 
port only the leading phoneme in each pair. On half the trials, the stop conso- 
nant (e.g., Ihl in BANKET) begins first by a short Interval while on the remain- 
ing trials the liquid (e.g., /I/ in LANKET) begins first. Subjects are asked to 
report 'the first sound (phoneme) they hear,' that is, to make a temporal order 
judgment (TOJ) . 

Typical TOJ-by -phoneme results are shown in Figure 2. At the top of the 
display is a subject who was correct in determining temporal order when the stop 
consonant led, but incorrect when the liquid led. Note that in English, stop + 
liquid can occur in initial position, but liquid + stop cannot occur initially. 
Hence, this subject reported what the language allows, not what the leading 
phoneme was. Another type of subject is shown at the bottom of Figure 2. This 
subject was highly accurate in judging the temporal order of fusible items, no 
matter whether the stop or the liquid led. In summary, the first subject is a 
poor judge of temporal order while the second is a good judge. Note that the 
TOJ-by-phoneme task requires only phonetic processing of the initial stop and 
liquid; it does not require phonological processing of these units into a 
cluster. Nevertheless, some subjects seem unable to disengage phonological pro- 
cessing mechanisms. 

Relationship between fusion and TOJ-by-phoneme . So far we have considered 
two tasks and have found contrastive performance among individuals in each. In 
the identification task, there were high and low fusers, while in the TOJ-by- 
phoneme task there were good and poor judges of temporal order. The correlation 
between performance on the two tasks Ls shown in Figure 3. The high fusers are 
poor judges of temporal order and have been termed "language-bound/' since they 
are heavily influenced in both tasks by the phonological rules of the language. 
The low fusers are good judges of temporal order and have been termed ''stimulus- 
bound," since they are highly accurate in reporting facts about the stimulus 
conditions . 

Temporal order judgment (by ear) . A final task using the same subjects and 
tapes again requires a TOJ. However, this time subjects are asked to report 
"which ear led" on each trial. Such a judgment does not require linguistic 
processing since the subject need not identify any phonemes. There is dramatic 
improvement in this task: some language-bound subjects show increases in per- 
formance aa great as 50% or more on liquid-leading trials. Nonetheless, some 
still perform better when the stop consonant leads, suggesting that they have not 
totally disengaged phonological processing mechanisms. 

Discussion . The general strategy In these dichotic fusion studies is to 
determine the level at which an individual can disengage linguistic processing 
mechanisms. Some do it readily, while others have great difficulty in doing it. 

The individual differences in the dichotic fusion tasks appear to be stable 
over time. Language-bound subjects given extended practice on the TOJ-by-phoneme 
task may improve a small amount, but still show grossly inferior performance in 
comparison with stimulus-bound subjects (Day and Thompson, in preparation). 
These qualitative differences in perception appear to depend upon the extent to 



129 



o 

LU 

cr 

8 



75- 



50 



25 



Subject IC. 



.100 75 50 25, 0 .25 50 75 lOQ 

* * A 111 I 

Stop led liquid led 



100 



75 



50 



25 



Subject R.L. 
J. — I — *- 



.100 75 50 25. 0 . 25 50 75 100. 
stop led ^ liquid led 

Lead Time (msec) 



Figure 2: 



130 



Percent correct temporal order Judgment (TOJ) for dlchotlc trials where 
the leading phoneme was either a stop consonant (e.g., /b/) or a liquid 
(e.g., /I/). Each display was obtalned;from a single subject. (From 
Dayt 1969) 
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which subjects are Influenced by various linguistic constraints, even when refer- 
ence to these constraints Is not necessary for task solution* 

It Is Interesting that such dramatic differences occur In the dlchotlc 
fusion tasks. However, If the differences occur only within these tasks, then 
they are of limited Interest. The major question, then. Is: do subjects retain 
their group Identity — language-bound, stimulus-bound — on other cognitive tasks? 
The present paper explores this question using the digit-span memory task. 

DIGIT- SPAN MEMORY 

In a typical digit-span test a series of numbers Is presented In rapid suc- 
cession, for example, 3-2-6-8-5-4-7-1-9. The subject's task Is to report all the 
numbers In the exact order they were presented. He Is given an answer sheet with 
a horizontal array of blanks; from left to right, these blanks represent the tem- 
poral order of the Items. In order for a digit to be scored correct It must be 
placed In the appropriate blank. Some tjrplcal results are shown In Figure 4, 
which plots percent correct for each serial position, from the first Item to the 
last In the list. Performance Is best at the beginning and end of the list, and 
falls In the middle. These data represent the "classical serial position curve." 
Superior performance at the end of the list Is called the "recency" effect; 
Items here appear to be In a very temporary storage system. Superior performance 
at the beginning of the list Is called the "primacy" effect; Items here appear to 
be In a more permanent storage system. Such results have been reported many 
times In the psychological literature. 

Control condition . In the present experiment, language-bound and stimulus- 
bound subjects were selected on the basis of their performance on the dlchotlc 
fusion tests. The subjects were then given a series of digit-span lists; each 
list contained nine digits and was spoken at a rate of two digits per second. 
The data obtained are those already shown In Figure 4. These data, however, have 
been averaged over the two groups of subjects. Figure 5 shows the same data 
plotted separately for each group, and shows clear, quantitative differences 
between the two groups: overall performance collapsed across all serial posi- 
tions was 88Z correct for stimulus-bound subjects and 63Z correct for language- 
bound subjects (Mann-Whltney U « 8.5, p<.001). 

Stlmalus-bound subjects showed little evidence of a serial position effect, 
while language-bound subjects showed a very marked serial position effect (with 
performance dropping as low as 28Z correct for the middle Item). Despite these 
apparent differences In the shapes of the two curves, caution must be exercised 
In concluding that there are qualitative differences In memory. The large dif- 
ference In overall performance level between the two groups makes such Inferences 
difficult from a statistical point of vlew.^ 

It Is Important to empnaslze the contrast between the averaged data 
(Figure 4) and the data separated by groups (Figure 3) . If Indeed language- 



It could be that both groups display the serial position effect, but that high~ 
performance "celling effects" are concealing It In the stimulus-bound dataT^ 
This problem will be reconsidered later. 
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bound and stimulus-bound characteristics are representative of the population at 
large, then the classical serial position curve could be one that all experi- 
menters obtain but that no Individual subjects display. Hence we could be 
building models for performance that does not occur. 

Suffix condition . A variant of the digit-span task was also given. At the 
end of each list the word ''zero" was added. Subjects were told that they did 
not need to report the zero, but to use It as a cue to begin their recall. Thus 
the zero acts as a redundant suffix. Crowder and Morton (1969) have shown that 
the suffix hurts items at the end of the list, the items that are in a very 
temporary storage system. Items elsewhere in the list are not affected by the 
suffix* In Crowder and Morton's terms, the suffix harms only those items in a 
"precategorical acoustic store." 

Data from the suffix condition of the present experlnient are shown in 
Figure 6, where they are contrasted with the data cf Figure 5 from the control 
(no suffix) condition. For language-bound subjects, the suffix hurt performance 
only for items at the end of the list. For stimulus-bound subjects, however, 
the suffix reduced performance throughout the list; nevertheless their overall 
performance remained very high. The differential effects of the suffix on the 
two groups suggest that we may be dealing with qualitative differences in memory. 

The suffix data again appeared to show differences in the shapes of the 
serial position curves for the two groups. The stimulus-bound curve was approxi- 
mately flat, while the language-bound curve showed the usual serial position 
effect, although with the recency effect weakened by the suffix. However, again 
if is difficult to make shape comparisons since the overall level of perfor^^ance 
was so different for the two groups. 

Overall performance transformations . In order to make between-group com- 
parisons of curve shape more meaningful, the data from both the control and 
suffix conditions were transformed to take into account the differences in over- 
all performance level. For each group in each condition, percent correct for a 
given serial position was divided by the overall percent correct on that entire 
test. The results of these transformations are shown in Figure 7. If there 
were no differences in memory for the various portions of the list, then the 
resulting figures ought to be about 11% at each of the nine serial positions, 
regardless of overall performance level. Stimulus-bound subjects clearly showed 
this "flat" pattern of results in the control condition, and a pattern very 
close to it in the suffix condition.^ Language-bound subjects, in contrast, con- 
tinued to show serial position effects in both conditions. Since there are clear 
differences in curve shape between the groups In both conditions, even w^^en dif- 
ferences in performance level are taken into account, the possibility that lan- 
guage-bound and stimulus-bound subjects possess qtialltatlve differences in 
memory receives additional support. 
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3 

If celling effects ver« concealing the serial position effect in the stimulus- 
bound control condition, as suggested earlier, then the suffix data ought to 
show evidence of the classical curve, since there were sufficient overall errors 
to yield a differential distribution over the beginning, middle, and end of the 
list. Since both stimulus-bound curves are essentially flat, the celling effect 
argument is weakened. Instead, it appears that these subjects have comparable 
memory levels over all portions of the list. 
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Figure 6: Control vs. suffix conditions for language-bound and stimulus-bound 
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DISCUSSION 



What Is responsible for the group differences In digit-span performance? A 
satisfactory explanation will have to wait until further parametric variations 
have been completed (for example, those dealing with rate of presentation and 
list length). However, some speculations can be made on the basis of the present 
data • 

One code or two ? Stimulus-bound subjects may translate the spoken digit 
string Into a companion visual representation. The visual Image of the string 
would be arrayed spatially. It Is well known that spatial tasks are handled by 
the nonlanguage hemisphere of the brain, which for most people Is the right 
hemisphere. Hence stimulus-bound subjects would be employing the spatial pro- 
cessing abilities of the right hemisphere In addition to the verbal processing 
capabilities of the left hemisphere. The existence of two storage codes, 
visual and verbal, for the digit strings could facilitate overall retention. 
While there Is no direct evidence for this Interpretation from the present experi- 
ment, elsewhere we have shown that stimulus-bound subjects are superior on a 
visual search test that emphasize^ spatial flexibility (Day, In preparation). 

Duration of storage systems . Another way to view the present data Is to 
consider the duration of various storage systems. Most 'current models of Immedi- 
ate memory Include an Initial store In which spoken items are held In a rela- 
tively unanalyzed form, for example, the precategorlcal acoustic store (PAS) of 
Crowder and Morton (1969) . Items must undergo phonetic processing In order to 
enter short-term memory (STM) where about seven (plus or minus two) chunks can be 
held for several seconds (Miller, 1956). Stimulus-bound subjects may be able to 
transfer Items from PAS more quickly, and heace be able to '*read out'* all the 
Items from a single store, STM, cnce stlmultft oresentatlon has been completed. 
Language-bound subjects may have a slower transi'^r system, so that when stimulus 
presentation Is completed, some (early) Items are In STM, other (late) Items are 
still In PAS, and the remaining (middle) Items are In transition between the two 
systems. Read-out of Items from STM Is no problem, and Items In PAS are followed 
through to STM and reported accurately. However, the "transition** Items often 
get lost. Experiments currently under way on rate of digit presentation will 
help clarify whether there are differences In processing speed between language- 
bound and stimulus-bound subjects. 

Conclusion * Language- bound and stimulus-bound subjects showed clear differ- 
ences In a short-term memory task. These differences are at least quantitative, 
and may also be qualitative. They suggest that the two groups of subjects Iden- 
tified In the dlchotlc fusion tasks are not simply b}rproducts of a particular 
experimental situation. Instead they may represent differences that play an 
Important role In general cognitive functioning. 
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On Learning ^'Secret Languages"* 
Ruth S. Day"^ 

Hasklns Laboratories, New Haven » Conn. 



00-DAY 00-YAY OH-NAY UHT-WAY IS-THAY EZ-SAY? For many people, the 
answer to this question Is ES-YAY. DUH-GOO YUH-GOO NUH-GO WUH-GUT THUH-GIS 
SUH-GEZ? For most people, the answer to this question Is I HAVEN'T THE FOGGIEST. 
Both utterances are examples of "secret languages": the first Is very common 
and Is known as Pig Latin, while the second Is rare and Is known as G-language.^ 

Secret languages are known to occur In many of the world's languages. 
Children often Invent them to talk among themselves without comprehension by 
adults or by other children not In their Immediate group. In the Philippines, 
courting adolescent couples have difficulty achieving physical intimacy, as 
they are closely watched by their chaperone; hence they use secret languages to 
gain verbal Intimacy (Conklln, 1956). In Surinam, small groups of teenage boys 
or young men use secret languages to establish peer group solidarity (Price and 
Price, In press). In some cultures, skill in linguistic play Is highly valued 
and is used more for entertainment's sake than for concealment, as in the 
"talking backwards" language of the Cuna Indians in Panaoua (Scherzer, 1970). 

Secret languages usually begin with the native language and add a few new 
rules. In Pig Latin the basic rules are: 

1. for each %rord, delete the first consonant (or consonant 
cluster)^ 

2. utter the remainder of the word 

3. add the deleted consonant, followed by the vowel AY. 
Therefore the mrd SECRET becomes EEKRUT-SAY. In G- language, the rules are: 



*Paper presented at the Eastern Psychological Association meeting, Washington, 
D. C, 3 May 1973. 

^Also Yale University, New Haven, Conn. 

^The author^s father is gratefully acknowledged for Inventing G-language; it has 
yielded interesting data in the secret language experiment as well as some 
enjoyable family coviunication* 

2 

There are additional rules for items that begin with vowels. 
[HASKINS LABORATORIES: Status Report on Speech Research SR-34 (1973)1 
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1. for each syllable, utter the first consonant (or conso- 
nant cluster)^ followed by the vowel UH 

2, add G before the next vowel and continue with the rest 
of the syllable. 

Therefore the word SECRET becomes SUH-GEE-KRUH-GUT, 

Secret Language Experiments 

■0 

Recently, we have been devising new secret languages that add rules at 
various levels of linguistic analysis. We then teach these languages to adults 
to see the extent to which they can operate on language at different linguistic 
levels. The following Is a recorded passage in one of these new secret Ian* 
guages; see If you can determine what the rules are: 

HERRO. THIS IS A TLANSFOLMED RANGUAGE. HOPEFURRY, YOU WIRR 
BE ABER TO SPEAK IT. IT'S NOT VELY HALD TO DO, ONCE YOU FIGULE 
OUT HOW TO DO IT. UNTIR YOU KNOW THE LURES THOUGH, YOU WIRR HAVE 
TLOUBER UNDELSTANDING IT.^ 

This R-L language was devised for experimental purposes. Its rules are: 

1. every time there Is an /r/, change It to 111 

2. every time there Is an /I/, change It to /r/. 

Thus ROCKET becomes LOCKET, LAVISH becomes RAVISH, and CASSEROLE becomes 
CASSELORE. Note that R-L language Involves new rules solely at the phoneme 
level; there are no changes at the syllable level as In most secret languages. 

In a typical word translation experiment, the subject Is given a standard 
English word and asked to translate It Into the secret language version. The 
following tape recorded passage taken from an experiment session Illustrates 
the procedure: 



To facilitate reading, secret language utterances are given In orthographic 
rather than phonemic notation*. Note that this notation does reflect the 
phonemic form of the secret language transformation rather than being a strict 
letter replacement system. For example, the phonemic representation of the 
standard and secret versions of the. word TROUBLE are /trAbal/ and /tlAb^/; 
the notation for the secret language form is TLOUBER rather than TLOUBRE 
which would be pronounced /tlAbra/. When phonemic nota.tlon Is needed. It Is 
given between slashes* 

Phoneme substitutions do occur in some secret languages, for example JLe bolite 
spoken in Haiti (Alexis, 1966). 



O 142 

ERIC 



STIMULUS 


SUBJECT C.F. 


LOCKET 


ROCKET 


CASSEROLE 


CASSELORE 


PICKLE 


PICKER 


MIDDLE FINGER 


MIDDER FINGLE 


LIVER 


RIVEL 


lawyer' 


ROYAL 


NELSON ROCKEFELLER 


NERSON LOCKEFERREL 



This subject responded quickly and accurately; she had no difficulty making the 
appropriate substitutions. After the word transformation task was completedt 
she was asked to produce some connected discourse In secret language form. We 
asked for a well-known passage In order to decrease task demands; the passage 
was ''Mary had a little lamb.'* Here Is the same subject reciting this nursery 
rhyme In secret language form: 

MAILY HAD A RITTER RAM, ITS FREECE WAS WHITE AS SNOW. AND EVLIWEL 
THAT MAILY WENT, THE RAM WAS SHULE TO GO. 

Note that all /r/*s and /l/'s were transformed appropriately, and that the 
whole passage was produced In a highly fluent fashion: pacing and Intonation 
resembled those of ordinary speech. 

Now consider another subject iix>rklng on one Item in the word translation 

task: 



EXPERIMENTER 


. SUBJECT D.Q. 


BRAMBLE 


BRAIR. . .BRAIR. . .RORE 


No t quite . . . BRAMBLE 


. . .BERLAIRM. . .LULL. . .LORE 


Not quite. . . 


BERLER. . .BLERM* . .BULL. . .BULL 


What are the two rules?... 
R goes to L... 


. . .R to L, and L to R 


...BRAMBLE 


BLER. . .BLERM. • .BUR 


That ' s close . BRAM would be ? . • . 


... BLERM 


BLAM. . .You^re ju^t transforming 
R's and L's, okiiy? 


Right. And BRAMBLE, I*m transform- 
ing the R to L. So it's BLER... 
BLER. . .BLERM. And BULL is... the 
L to the R...lt*s BER. 


...You're still sticking an R in; 
you said BLERM. Where do you 
get BLERM? It was BRAM. . . 


Yah, BRAM. And the R in BRAM goes 
to L. So it's BL— ...BLERM. 
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BLAMl Just plain BLAM. See» you're 
Still putting an R In next to the 
M« • • 


BRAM, but you have B-R. And the R 
goes to L. So I'm trying to de- 
lete that R, and make It B-L 
rather than B-M. 


.Rather than B-R. 


Rather than B-R, right. So I'm 
cryxng co say oIjIIiK. . .DijEiKn. 


Where do you get the R? It's 
BRAM originally, B-R-A-M... 


B-R-A-M. Doesn't that go to B-L-A-M? 


Right! How do you pronounce 
B-L-A-M? 


BLAM. 


Right. Okay, let's go ahead... 





This subject had great difficulty with the task. Even though he could state the 
two rules, he was unable to employ them effectively. He worked on this Item for 
almost 3 minutes and never successfully transformed the whole word; finally, the 
experimenter went on to the next Item. I will spare you his NELSON ROCKEFELLER. 

These two subjects Illustrate the wide range In ability that Individuals 
have In learning the R-L language. The topic of Individual differences will be 
discussed more fully later. 

The Present Secret Language Experiment: General Findings 

Method . In the present expi^rlment, 63 Yale University students learned 
R-L language. After hearing a brief recorded passage In secret language form 
(as given above), they were told the two rules. They then took a word transla- 
tion test. All 24 stimulus Items were acceptable English words, but half 
yielded words (W) and half nonwords (NW) In their secret language versions. 
Sample Items for the W — case were ROCKET — ►LOCKET and LAWYER — ►ROYAL. Sample 
Items for the W — case were BRAZIL — ►BLAZIR and LIVER — *IVEL. There were 43 
target phonemes (/r/ and /I/) In all, since some Items contained multiple tar- 
gets (e.g., BRAZIL). Subjects were Instructed to transform only the /r/'s and 
/l/'s, and to keep all other sounds^ constant.^ After completing the word trans- 
lation task, subjects recited "Mary had a little lamb" In Its R-L version. Each 
subject was tested Individually. All responses were tape recorded and later 
transcribed Into phonemic notation. 



^The arrow should be read "yields." 

^Admittedly, there are some obligatory phonetic changes In neighboring units as 
/r/ and /I/ replace each other. For example, the vowel before the final liquid 
must be qualitatively different In BRAZIL and Its secret language version, 
BLAZIR. In such cases, the phoneme that was closest to the original but which 
was still permissible In the new phonetic context was taken as the "correct" 
form. 
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Error analysis > There were three major types of errors: 1) failure to 
transform a target phoneme (e.g., LIVER — ^IVER rather than RIVEL); 2) phonemic 
changes In nontarget phonemes (e.g., OFFER — K)HFUL rather than AWFUL) ; 
3) Intrusions (e.g., BRAMBLE-~^LAMBLER) . A composite error score for each 
subject was obtained by summing the number of all types of errors. The average 
number of errors was 21 per subject. 

Output tempo . Another Interesting aspect of the secret language responses 
was output tempo, which Is Illustrated In Figure 1. At the :op of the display 
Is a diagrammatic representation of a stimulus Item. The time line moves from 
left to right while the hatch marks Indicate that audible sound Is being pro- 
duced. While this Is a very crude representation of speech. It does serve the 
present analysis. There were two general kinds of output. 1) Global response: 
there was only a brief pause following stimulus presentation; then the entire 
Item was uttered In transformed version. 2) Sequential response: there was a 
fairly long pause after the stimulus Item; then the subject gave part of the 
response, paused, ^jgave some more, sometimes paused again, and finally finished 
his response. The following taped passages Illustrate these two forms of 
response. First, a subject who characteristically gave global responses: 



STIMULUS 


SUBJECT T.C. 


LEVER 


REVEL 


BRAMBLE 


BLAMBER 


LAVISH 


RAVISH 


MIDDLE FINGER 


MIDDER FINGLE 



This subject barely paused after the stimulus Item, then produced the %Aole Item 
In secret language form. Next Is a subject who characteristically gave sequen- 
tial responses: 



STIMULUS 


SUBJECT R.C. 


PICKLE 


. . .PICK. 


..KER 


LIKELY 


... RXKE . 


..RY 


BRAMBLE 


• . •BLAM. 


..BER 


MIDDLE FINGER 


. . .MXT . . 


.TER...FING...GLE 



This subject paused for a fairly long time before beginning his response and 
gave small response units Interspersed with additional pauses. 

Orthographic Influences . Some subjects seemed to stay within the auditory 
mode: given the sound units of the stimulus, they changed the appropriate units 
and gave the resulting phoneme string as their response. Others seemed to con- 
vert the Input phonemes Into an orthographic representation before any trans- 
formations were made. Figure 2 Illustrates the two approaches for the stimulus 
word LITTLE. The phonemic representation of the stimulus Is shown on the left 
side of the display; the "T" sound Is actually a flapped Itl which sounds like 
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Figure I 
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/d/. One way to transform LITTLE into R-L language is RIDDER. This transforma- 
tion stays entirely within the sound mode: the initial III is transformed to 
/r/, the final III — ►/r/, such that the output is /ridar/. Other subjects give 
/ritra/ as their response. Given the input /lldal/, they appear to transfo^rm it 
into an orthographic mode of representation: L-I-T-T-L-E. Transformations are 
then made on letters; the letter "L" is replaced by "R" in two locations, yield- 
ing R-I-T-T-R-E. This new orthographic representation is then turned back 
into a sound representation, such that the subject says /ritra/ rather than 
/rIdar/. 

Individual Differences 

Earlier we pointed out the wide range of individual differences in success 
of making Ixh — ^lll transformations. Were there systematic individual differ- 
ences in the present experiment? In order to answer this question, a brief 
discussion of systematic individual differences in another paradigm is necessary. 

Studies of dichotic fusion . Large and systematic individual differences 
have been obtained in the dichotic fusion situation (Day, 1969). Briefly, a 
different message is presented to each ear at the same time over earphones and 
subjects are asked to report *what they heard.* The dichotic items are of the 
general form BANKET/LANKET. On a substantial proportion of trials, fusions 
occur: subjects report hearing BLANKET. The extent to which a population of 
subjects fuses is of primary interest for the present discussion. Each subject 
is given a fusion score, which is simply the percent of trials on which he gave 
a fused response such as BLANKET. We then look at the number of subjects who 
achieved the full range of scores. Ordinarily, in most psychological tests, a 
normal distribution is obtained, whether the scores represent percent correct, 
number of trials to criterion, or a variety of other measures. However, dichotic 
fusion scores are not normally distributed, but are instead bimodal in nature: 
some subjects fuse most of the time, while others rarely fuse. Since the first 
bimodal distribution of fusion scores was obtained (Day, 19^0), the effect has 
been replicated many times over several hundred subjects. 

Another task, which uses the same dichotic tapes, asks the same subjects 
to determine which phoneme begins first on every trial. For an item like 
BANKET/LANKET, the /b/ begins first by a short interval (e.g., 25-150 msec) on 
half the trials, while the III begins first by the same intervals on the remain- 
ing trials. The high f users report hearing the /b/ first on most of the trials. 
Note that in English /bl-/ can occur in initial position but /lb-/ cannot. 
Therefore the high f users are reflecting the phonological rules of English; 
they are not reflecting the true stimulus conditions. These subjects have been 
termed "language-bound" since they are bound by the facts of their language so 
that they cannot Judge the target stimulus events accurately. When the low 
f users are asked to Judge temporal order they are highly accurate, whether the 
/b/ or the III began first. These subjects have been termed "stimulus bound" 
since they are accurate Judges of the target stimulus events. 

Current secret language experiment . Some of the subjects in the present 
experiment also took the dichotic fusion tests. Eleven were classified as 
language-bound and eleven as stimulus-bound. There were clear differences in 
secret language facility between the two groups. The data given below are for 
the word translation task; a similar pattern of results was obtained on the 
translations of "Mary had a little lamb." 
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The language*bound subjects made almost twice as many errors: their com- 
posite error score was 23 errors per person as compared with 13 for the stimulus* 
bound subjects. Most of the differences between the two groups occurred on 
W — Items where the composite error scores were 20 and 10, respectively. 

Over all Items, language*bound subjects gave more global than sequential 
responses, while stlmulus-bound subjects gave an equal number of both types of 
output tempo. Again, the type of Item made a difference. For language*bound 
subjects, the percent of responses that were global vs. sequential was 80% vs. 
20Z for W — ^ Items and 48Z vs. 52X for W — >fM Items. For stimulus-bound sub* 
Jects, these figures were 60% vs. AOZ for W — Items and 37% vs. 63% for W — ^ 
items. Thus, while both groups of subjects gave more global responses on W — Hf^ 
Items, It was only the stimulus-bound subjects who gave more sequential responses 
on W — ^ Items. It might be of Interest to teach language-bound subjects to use 
a sequential strategy on W — HHW Items. While their performance may Improve some- 
what, they still may do poorly, as Illustrated by the attempts described above to 
get Subject D.Q. to transform BRAMBLE syllable-by-syllable. 

Language*bound subjects gave four times as many responses that re^flectcd 
orthographic Influences. They were thus less able to make transformations solely 
at the sound level, Independent of written structure. Taese are the same sub- 
jects who had difficulty In making temporal order Judgments In the dlchotlc 
fusion situation Independent of phonological rules. The stimulus- bound subjects 
were not misled by orthographic conventions In the secret language experiment, 
nor were they misled by phonological constraints In dlchotlc fusion tests. 

Discussion . Perhaps Saussure*s (1915) notion of la langue (language) and 
la parole (speech) Is helpful In understanding the two groups of subjects. Stim- 
ulus-bound PMbjects are ^ible to track the "speech" end, that Is, the actual per* 
formance aspecis of an utterance. Language*bound subjects, on the other hand, 
perceive an utterance through "language," that Is, through the abstract structure 
of their language. 

The secret language .experiment can be used for at least two types of re- 
search. 1) "Psychological reality." Secret language rules can be added at vari- 
ous, levels of l:<ngulstlc analysis, for example, the phonetic, phonological, 
syllabic, syntactic, or semantic levels. The relative ease with which subjects 
can make these transformations may reflect the extent to which each level or 
type of rule Is psychologically "real." 2) Individual differences. The secret 
language experiment Is well adapted to studying Individual differences In lan- 
guage ability. The ease with which Individuals can operate on linguistic struc- 
ture may well have predictive value for foreign language learning. 

In conclusion, I Introduce you to the secret language experiment as a new 
tool for studying psychollngulstlc phenomena. It also happens to be an enjoyable 
experience t both for the subject and the experimenter. 
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Hemispheric Specialization for Speech Perception In Six-Year-Old Black and 
Vftilte Children from Low and Middle Socioeconomic Classes 

M. F. Dorman and Donna S. Geffner 

ABSTRACT 



Six-year-old black and white S^s from low and middle socioeconomic 
classes (SEC) were presented a dichotic listening task composed of 
syllable pairs. All groups evidenced a significant right-ear advan- 
tage (REA) at recall. The magnitude of the REA did not differ as a 
function of race or SEC. The magnitude of the REA averaged over all 
^s was similar to that of adults. 

On dichotic listening tests with adults, when verbal stimuli are presented 
simultaneously to the left and right ears, the stimuli presented to the right 
ear are recalled better than those presented to the left ear (Klmura, 1961a; 
Bryden, 1963; Broadbent and Gregory, 1964; Curry and Rutherford, 1967; 
Shankweller and Studdert-Kennedy, 1967). This right-ear advantage (REA) pre- 
sumably reflects the functional prepotency of the contralateral auditory path- 
ways and the left hemisphere's specialization for the perception of speech 
(Kimura, 1961b; Milner, Taylor, and Sperry, 1968; Studdert-Kennedy and 
Shankweller, 1970). 

In children the REA appears to vary as a function of age, sex, and socio- 
economic class (SEC) background. Kimura (1967) found that low SEC females and 
high SEC males and females evidence a REA at age five, whereas low SEC males do 
not evidence a REA until age six. Recently Geffner and Hochberg (1971) have 
reported a large (age) X (SEC) interaction in the development of the REA. Four- 
to seven-year-old Ss from both low and middle SEC backgrounds were presented a 
dichotic digits task (cf. Kimura, 1963). The middle SEC S^s evidenced a REA at 
all ages. The low SEC S^s did not evidence a REA until age seven. These data 
led Geffner and Hochberg to speculate that children from low SEC backgrounds 
may not develop left-hemisphere specialization for speech at the same rate as 
children from more privileged SEC. backgrounds. 

The Geffner and Hochberg data are very striking for they suggest that cor- 
tical lateralization of function, which has been thought to be maturatlonally 
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determined (Lenneberg, 1967) , may be radically slowed down by environmental de- 
ficiencies during development. The nature of the environmental conditions 
which determined the performance of the low SEC Sa is, however, not at all 
clear. One nonenvironmental variable which may have affected the outcome of the 
Geffner and Hochberg study was the large proportion of black children in the low 
SEC group. Conceivably the delayed lateralization of speech found for the low 
SEC population may have been a racial effect inC:eracting with socioenvironmental 
variables. 

To clarify the effects of race and SEC as determinates of the REA in chil- 
dren, in the present study, six-year-old black and white children from both low 
and middle SEC backgrounds were presented a dichotic syllable test (cf. 
Studdert-Kennedy and Shankweller, 1970). To minimize a possible source of ex- 
perimental bias, the groups of children were tested by an examiner of their own 
race . 

Another purpose of the present study was to compare the REA of ^ix-year-old 
children with previously collected data on the REA in adults (Studdert-Kennedy 
and Shankweller, 1972). if, as Lenneberg (1967) has suggested, cortical later- 
alization of function is not complete until approximately puberty, then we may 
expect that six-year-old ^s would evidence a smaller REA than adults. If, 
however, lateralization is complete by age five (cf . Krashen and Harshman, 1972) 
then we may expect that the magnitude of the REA in six-year-old children and 
adults would be similar. 

METHOD 

Subjects . The S^s were 52 six-year-old children (C.A. 6.0-6.8): 26 white 
^s, 13 each from low and middle SEC; 26 black ^s, 13 each from low and middle 
SEC. SEC was determined by Hollingshead*s Two Factor Index of Social Position 
(Hollingshead, 1965) which takes into account the parents* educational level and 
occupational status. All S^s were right-handed (handedness tasks are detailed in 
the Procedure) and had normal hearing with no known perceptual, neurological, 
speech, or language deficit. Children with a bilingual background were not se- 
lected. The S^s were matched as well as possible by class placement and perfor- 
mance. Because intelligence quotients are not available in New York City public 
schools, the authors obtained all information pertaining to the parents' occupa- 
tion and educational level, the home evnironment of the £s, and classroom per^ 
formance from the classroom teacher, principal, guidance counselor, or parent. 

Apparatus . The stimuli were reproduced on a Roberts 1920 stereo tape 
recorder via matched TDH-39 headphones. The output of each tape channel was 
calibrated and monitored by a Hewlett-Packard voltmeter. A 1000 Hz tone on both 
channels of the test tape wrs used as a calibration signal. Audiometric thresh- 
old tests were administered on a Haico HA-10 portable audiometer calibrated to 
ISO standards. 

Preparation of stimirli . With the aid of the Haskins Laboratories* computer- 
controlled parallel resonance speech synthesizer the stop consonant-vowel sylla- 
bles /ba, da, ga, pa, ta, ka/ were generated. Each stimulus was composed of 
three formants, and was 300 msec in duration. Under computer, control these six 
stimuli were then combinrd into the 15 possible pairs (no stimulus was paired 
with itself) and were recorded dichotically in a fully balanced order onto 
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magnetic tape. The resulting tap^ contained 60 stimulus pairs with each member 
of a pair occurring twice on each channel. The Interstimulus Interval was A sec. 

Procedure. Each was tested in a quiet room, most often the school 
nurse's. All S^s were first given an audiometric threshold test. Hearing level 
at 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz was assessed. If the hearing level 
between the two ears differed by 10 db or more for two of the test frequencies, 
the was excused from further testing. Handedness was determined by asking the 
Ss to perform three manual motor tasks: throwing a ball, cutting with scissors, 
and drawing a circle. Any S who did not perform all three tasks with his right 
hand was not tested further. 

After the preliminary examination, the ^s were presented binaurally three, 
repetitions of the syllables /ba, da, ga, pa, ta, ka/. The ^s were instructed 
to listen with both ears and report the syllable heard. Any unable to repeat 
the six syllables after the third repetition of the list was excused from further 
testing. Next, the S^s were presented three dichotic practice trials. Again the 
S^s were instructed to listen with both ears and report the syllable heard. 
(Since the ^s were not told there were two different stimuli on these and the 
following dichotic trials, only one response was elicited 0 The S^s were told 
that these sounds would sound "funny" but to continue reporting them as before. 
The Ss were then presented the 60-item test sequence, followed by a brief rest, 
then the 60-item test again. To control for possible channel effects, the head- 
phones were reversed after each 60-ltem test. The black S^s were tested by a 
black student assistant, while the white Ss were tested by a white examiner. 

Results 

, R-L 

Each i s performance was scored in terms ot the metric ^^pj^ x 100 where R is 

the total nunber of items correctly reported from the right ear and L is the total 
number of items correctly reported from the left ear (for a discussion of this 
scoring technique see Studdert-Kennedy and Shankweiler, 1970). The mean score 
for each of the groups subcategorized by sex is shown in Table 1. 



TABLE 1: 
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Magnitude of the REA for all groups in terms of x IQQ. 
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Males 


(n - 5) 
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9.69 
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Females 
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Females 


(n - 8) 




16.72 




Average 


(n - 
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Hales 


(n -• 


4) 
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Males 


(n - 11) 
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Females 


(n - 
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9.54 


Females 


(n - 2) 
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10.22 
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(n - 
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7.14 


Average 


(n - 13) 
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10.16 
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Each (race) X (SEC) group's mean score Was evaluated by tests for corre- 
lated samples against the hypothesis that there was no difference in accuracy of 
report between the left and right ears. As shown in Table 2, all groups evi- 
denced a significant REA. 



TABLE 2: Mean number of syllables correctly reported for each ear. 



Group 


N 


Left 


Right 


jt 


black low SEC 


13 


35.00 


42.92 


3.50** 


black middle SEC 


13 


34.76 


44.86 


2.26* 


white low SEC 


13 


41.46 


47.87 


2.86* 


white middle SEC 


13 


33.61 


48.38 


2.19* 



*p < .05 
**p < .01 



To determine whether the magnitude of the REA differed as a function of 
race or SEC, the individual scores were collapsed over sex into an analysis of 
variance with race and SEC as treatment variables. Neither tht race (F l|48 » 
.021, p > .05) nor the SEC (F l|48 » 1.009 > .05) main effect was significant. 
The (race) X (SEC) interaction, suggested by the reduced REA for the low SEC 
white S^s, was not significant (F l|48 « .405, p > .05). 

Statistical analysis of male-female differences was not attempted because 
of the generally small sample size, especially for males (n > 2) in the black 
low SEC condition. Because two of the four male white low SEC S^s evidenced 
rather large left-ear advantages, no overall ear advantage occurred for this 
group. There is no reason to suspect that the obtained REA is representative 
of the entire white male low SEC population. 

DISCUSSION 

Racial Factors in REA 

No difference in magnitude of the REA was found between black and white S^s. 
A similar outcome is reported by Sadick (in preparation). Black and white five- and 
seven-year-old S^s were presented a dlchotic syllable test similar to that used 
in the present study. At both the five- and seven-year levels, both black and 
white S^s evidenced a REA. The magnitude of the REA did not differ between the 
groups. We may then tentatively conclude that the rate of cerebral lateraliza- 
tion of function does not vary as a function of racial origin. This conclusion 
is, of course, limited to the racial groups and SEC environments studied. 

SEC Factors in REA 

The presence of a significant REA in the low SEC groups, although conflict- 
ing with the outcome of Gef fner and Hochberg (1971) , is consistent with the 
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outcome of a recent study by Knox and Kimura (1970) • These investigators 
assessed cerebral lateralization for speech and nonspeech sounds in five- to 
eight-year-old low SEC ^s. The Sa were presented both dlchotlc digits and di- 
chotlc environmental sounds. In the digit condition, all age-sex groups evi- 
denced a REA. Thus, in Kimura's several studies of the REA in five-year-old low 
SEC S^s (Kimura, 1967; Knox and Kimura, 1969) both males and females have evi- 
denced REAs, although males less consistently than females. 

These data, paired with the significant REA In the six-year-old black and 
white low SEC S^s found in the present study, suggest that at least some, perhaps 
the majority, of low SEC S^s achieve left-hemisphere specialization for speech at 
the same rate as higher SEC S^s. 

One possible explanation for the difference in outcome between the present 
study and that of Geffner and Hochberg (1971) is that the low SEC S^s examined by 
the latter investigators may have been t?lsed in more deprived environments than 
those of the present study. Geffner and hochberg argued that abnormal rearing 
conditions may have resulted in a retarded rate of cerebral lateralization of 
function. However, an alternative and somewhat less radical explanation is that 
abnormal rearing conditions engender S^s who function at very low cognitive and 
motivational levels. Such ^s might perform "indifferently" on a relatively com- 
plex task like dichotic digits, especially when tested by someone not of their 
own race. On this view it would be expected that Geffner and Hochberg' s four-, 
five-, and six-year-old low SEC S^s would evidence very low overall performance 
levels on the digits task. An analysis of the Geffner and Hochberg data has in- 
dicated that, indeed, the low SEC S^s reported only 53Z of the total possible 
digits, while the middle SEC ^s reported 62Z. Thus the low SEC S^s evidenced a 
significantly lower performance level (ti54 - 3.98, p < .001)*- This outcome 
suggests that the absence of a REA in the low SEC ^s may have been a "floor 
effect" (cf . Halwes, 1969) resulting from task difficulty and motivational vari- 
ables. To choose between the alternative explanations for the absence of a REA, 
it would appear necessary to present four- and five-year-old very low SEC Ss with 
a relatively simple dichotic test (e.g., dichotic syllables) in a situation 
which would maximize the S^s' motivational level. Until such data have been col- 
lected, the effect of rearing conditions on the rate of cerebral lateralization 
of function remains unclear. 

Development of the REA 

Lenneberg (1967) has suggested that the end of the critical peri d for lan- 
guage acquisition and the terminus of cerebral lateralization of function occurs 
at approximately puberty. On this view, we may suspect that the magnitude of 
the REA would increase until puberty* 

The data from several experiments indicate, however, that the REA may not 
systematically increase in magnitude between age five and adulthood. In the 
present study of six-year-olds the magnitude of the REA averaged over groups was 
11.08. This magnitude WA Is well within the range of REAs found for samples of 
the adult population tested with a dichotic syllable procedure. Dorman and 
Porter (1971) found a REA (n « 10) of 12.0 while Studdert-Kennedy and Shankweiler 
(1972) report a REA (n - 30) of 10.0. The proportion of ^s (.21) with a left- 
ear advantage is also similar to that found for adult populations (Studdert- 
Kennedy, personal communication) . 
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In a developmental study of the R£A» Berlin, Love-Bell, Hughes, and Berlin 
(1972) administered a dlchotic syllable teat to male and female S.s 3 to 13 years 
old* All age-sex groups evidenced a REA. Although the total number of correct 
responses increased with age, the magnitude of the REA did not systematically 
increase with age. From these data and those collected from adults using the 
same test, Berlin et al« (1972) concluded that the HEA may develop fully by age 
five* Krashen and Harshman (1972) have also reached this conclusion after a re- 
analysis of earlier dichotic listening dala (Geffner and Hochberg, 1971; Kimura, 
1963; Knox and Kimura, 1970), taking into account changes in guessing strategy 
with increased age. The "lateralization by age five** hypothesis also appears 
consistent with clinical data on language disturbance following brain injury 
(Krashen and Harshman, 1972). 

Finally, a number of studies indicate that the cortical mechanisms underly- 
ing the perception of speech may be lateralized in very young children, perhaps 
even infants. Yeni-Komshian (personal communication) has found a REA in some 
three-, four-, and five-year-old children (also see Nagafuchi, 1970), while 
Mo.lfese (1972), using an auditory evoked response technique, has reported larger 
left-hemisphere than right-hemisphere responses to speech signals in infants. 
Taken together, the studies cited above suggest that cortical specialization for 
speech perception may be present in very young children and may be complete by 
ages five tc bix. 
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Oral Feedback, Part 1: Variability of the Efjfect of Nerve- Block Anesthesia 
Upon Speech 

Gloria Jones Borden,"*" Katherlne S. Harris,^ and William Oliver"*^ 



The effects of bilateral mandibular nerve blocks on speech were 
judged by listeners and by transcribers. Seven aduln male speakers 
repeated under normal and nerve-block conditions 66 sentences heavily 
weighted with consonant clusters kno%m from pilot studies to be vul- 
nerable to nerve-block distortion. Listener Judgments of the speech 
revealed large magnitudes of subject variance. Although all subjects 
reported loss of sensation, the effects on speech ranged from com- 
pletely unaffected to markedly affected. Distortions were noted by 
narrow phonetic transcription in 23% of the data, most prominently in 
Isl clusters. 

The question of whether skilled speech is an open loop system requiring 
little or no feedback from the periphery, or a closed loop system requiring sen- 
sory information to control the production is a provocative topic and basic to 
our understanding of speech patterning. One feedback channel, that of sensation 
from the oral cavity, can be studied by examination of the effects of sensory 
deprivation. One approach is to examine the speech of subjects in which an oral 
sensory deficit is pathological, but this method yields contradictory conclusions 
(Chase, 1967; McDonald and Aungst, 1970). It is difficult to obtain specific in- 
formation on the relationship between oral sensation and speech from clinical 
cases, due to ^-he multiplicity of handicaps. 
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A potentially productive way of studying the relationship between sensory 
feedback from the oral area and articulation Is to Interrupt feedback by block- 
ing the trigeminal nerve In normal speakers. 

It Is frequently observed that after dental procedures Involving nerve 
blocks there is often a disturbance of clearly articulated speech until the 
effect of the anesthesia has disappeared. It is understandable, therefore, that 
investigators interested in afferent control of speech should block the sensory 
nerves of normal speakers with anesthesia in order to study the relationship 
between feedback from the oral area and articulation of speech. Presumably all 
feedback channels are used to acquire language: audition, taction, and proprio- 
ception. Do normal adult speakers need to depend upon these feedback possibilities 
during ongoing speech, and to what degree or under what circumstances does each 
channel play a role? McCroskey (1958) was the first to report that blocking oral 
sensation with mandibular and Intraorbital injections of anesthesia had an 
adverse effect on articulation. Substitution and distortion errors were reported 
(McCroskey, Corley, and Jackson, 1959). Rlngel and Steer (1963) confirmed the 
findings of McCroskey. It was assumed that the reason for the articulatory dete- 
rioration was the interruption of a closed loop control system. Locke (1968) 
questioned the technique, as it might have both motor and sensory effects, but 
Schllesser and Coleman (1968) reported complete elimination of tongue sensation 
as tested by oral stereognosis measures after mandibular blocks and the applica- 
tion of a topical anesthetic to the anterior palate. They also reported very 
little, if any. Interference with the motor conti?l needed to lateralize the 
tongue or to perform dladochokinetic tasks. Several investigators interested by 
the McCroskey study and the Rlngel and Steer study attempted to specify further 
the effects of the nerve block. Work was done on this subject somewhat concur- 
rently by Gammon, Smith, Daniloff, and Kim (1971), by Scott (1970), and by the 
authors. Gammon and colleagues found a 20Z rate of mlsartlculatlon with anesthe- 
sia. Errors were more prominent in the labial and alveolar regions, with frica- 
tives and affricates especially distorted. Scott no'ed that sibilants were less 
closely produced and other phonemes were retracted but maintained the intended 
manner of articulation. The stimuli used were 24 spondee words. 

The purpose of this study was to investigate further the distortion of 
phonemes vulnerable to nerve block. Is the effect upon speech slight or severe? 
Does it affect subjects similarly? What phonemes are distorted? 

METHOD 

Two pilot studies, the first using a consonant-vowel-consonant (CVC) bal- 
anced list and the second using words containing fricatives, revealed that speech 
deterioration under nerve block was in many cases evident only in rapid, connect- 
ed speech. Sixty-five eentences were, therefore, constructed for the final study 
(Borden, 1971). The subjects were seven university students, all normal speakers 
of standard English. The recording was done in a quiet interior room at the . 
University of Pennsylvania School of Dentistry. Each subject had two sessions. 
The conditions of normal and nerve block were rotated as far as possible. In 
each session the subject repepted the sentences after listening to a recorded 
speaker heard through earphones from a second tape recorder. The anesthesia was 
administered by a dentist using the standard denta technique for producing a 
mandibular block (Cook-Waite Labs, Inc., 1971). The puncture was made at the 
apex of the pterygomandibular triangle which is about 7 mm above the occlusal 
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surface of the teeth. Half of the solution of 2% lldocalne was deposited half- 
way back toward the wall of the mandibular sulcus. This usually anesthetizes 
the lingual nerve. Vfhen the needle reached the ramus » the rest of the solution 
was deposited around the inferior alveolar nerve. The method of injection is 
schematized in Figure 1. The amount of anesthesia was 1«5 cc of solution on 
each side. When subjects reported loss of sensation to the dentist's probes of 
the tongue, taping began. In some instances, an additional 1.5 cc was injected 
if needed. The anesthesia lasted for the session which took little more than 
one-half hour. 




Figure 1: Inner surface of ramus with needle in the right mandibular sulcus. 



Listening Test 

Thirty-eight utterance samples in the form of phrases were extracted from 
the recorded material to construct a listening test. This test was used for 
listener Judgments of speech deterioration and for narrow phonetic trar»*^ciiptions 
by transcribers. The test was heavily weighted with utterances that from pilot 
studies were found to be most vulnerable to the nerve block. 

The tapes of each subject were presented to a group of listeners. Utterance 
samples of both conditions had been spliced into matched pairs, randomized, and 
separated by one second of silence between each one of a pair and four seconds 
of silence between pairs of utterances. The listeners were 16 university 
students instructed to check a if the first example of each pair seemed more 
deteriorated, or check b if the second example seemed more deteriorated. The 
incorrect ^responses (checking normal condition as deteriorated) were counted and 
tabulated according to speaker and according to utterance. Correspondence between 
those utterances sampled during nerve-block conditions and the listener response 
"more deteriorated" served as an index of nerve-block influence. 

er|c 



Phonetic Transcriptions 



Two experienced transcribers made narrow phonetic transcriptions of the 
listening test tapes. The transcribers worked on material and speakers not 
used In the study to standardize their phonetic system. It was decided that 
the direction of the distortions should be Indicated whenever possible. For 
example » If the I si sounded somewhat like /6/, the transcription would be /s®/, 
whereas If It was more toward ///, It would be transcribed /s<^/. If the I si 
was slurred but In an undetermined direction, the Indication was /^/. 

RESULTS 

Listener Judgments 

Incorrect listener responses were tabulated according to speakers and 
according to utterances. Two analyses of variance were conducted to Investigate 
variation among listeners and among speakers and further, the variation among 
utterances according to listeners (Borden, 1971). 

It was found that there was no significant difference among listeners* 
Thus, the listeners were apparently using the same criteria In their Judgments. 

The variation among utterances according to speaker was significant at the 
•05 level. Indicating marginal significance. One speaker who evidenced no speech 
distortions under nerve block was removed for this analysis. The total possible 
"incorrect** listener responses for each utterance was 96 (6 speakers as heard by 
16 listeners) of which A8 would be expected by chance, even without r nerve 
block. In general the single consonants deteriorate less than the clusters 
bince listeners have more trouble Identifying the block condition. 

The variations among speakers was found to be highly significant as Judged 
by listeners. Since there were 38 utterances and 16 listeners, there were 608 
possible incorrect listener responses for each speaker, 304 expected by chance. 
It can be seen in Table 1 that the nerve block had no effect on speaker B (315 
incorrect responses) as determined by listener Judgment. Speaker C, in contrast. 



TABLE 1: Incorrect listener responses accordi:ig to speaker. 



Speakers 


B 


A 


E 


F 


G 


D 


C 


Total 


315 


2A6 


228 


222 


218 


193 


96 


X Utterances 


50 


AO 


38 


37 


36 


32 


15 



was most affected, as the listeners made relict ively few errors of Judgment (96) 
between the normal and the nerve-block utterances. Speakers, then, varied con** 
slderably in their performance under nerve blork as Judged by listeners. Even 
when the speaker with no perceptible effect on his speech was removed for the 
second analysis, a significant variation among the remaining 6 speakers was found 
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at the .01 level of confidence. The extent of this variation was surprising to 
the experimenters » as there had been no previous mention of Interspeaker 
variance In levels of deterioration due to oral anesthesia. 

Transcriber Judgments 

The transcribers made the transcriptions Independently. Transcriber agree- 
ment was quite high. For the 228 utterances transcribed, transcribers agreed 
that there was no effect In 67Z of the data. There was agreement both that there 
was a distortion effect and on the nature of that distortion in another 20% of 
the data, bringing the transcriber agreement up to 87%. Of the total number of 
utterances 3% had transcriber agreement that there was a deviation, but the direc- 
tion or place of the distortion in the utterance was Judged differently. For the 
final 10% of the data, one transcriber heard differences which the other did not 
consider to be distortions. 

To determine if the transcribers were making Judgments on utterances similar 
to the Judgments made by the 16 listeners, the utterances were ranked according 
to transcriber Judgments of deterioration to test the correlation of that ranking 
with the utterances ranked according to listener errors. The utterances were 
given scores to indicate their relative degree of distortion as Interpreted by 
the two transcribers. An utterance received a score of zero if there was no dif- 
ference noted by either transcriber between the normal and nerve-block condition 
in any speaker. A score of 1/2 indicated that a difference was noted by one 
transcriber in one speaker, and 3/4 indicated that a difference was noted by one 
transcriber in 2 speakers. Scores of 1, 2\ 3, or 4 were assigned if there was 
transcriber agreement that there was a distortion in 1, 2, 3, or 4 speakers 
respectively. After each utterance was assigned a score, the utterances were 
ranked from the most affected by the block to the least affected* Table 2 shows 
the key words removed from their embedding phrases as ranked by transcribers and 
by listeners. Using Spearman's Rank Correlation, the ranking of utterances given 
by the transcribers correlated significantly at the .01 level of significance 
with the ranking of utterances given by the 16 listeners. 

The phonemes transcribed as distorted unler nerve block were /t//, /dr/, 
/s/, /z/, ///, /t/, and /I/. All of the /s/ two-consonant clusters were dis- 
torted, especially /st/. Among the /s/ three-consonant clusters only the final 
/kst/ remained undlstorted by the block. The /s/ was the distorted portion of 
the cluster in all cases, with additional distortion on /r/ in t\ro utterances 
with /spr/ and /skr/ clusters. There were no errors transcribed for the labials, 
the velars, the labiodentals, or for /d/ or /n/. 

All of the errors noted by the transcribers were errors of place. The 
errors were never sufficiently deviant to cross phoneme boundaries. The most 
prominent distortion was for the /s/ to deviate toward ///. In all cases the 
distortion seems to be the result of the tongue failing to reach target position 
or of target precision. 

Speaker Variation 

Transcriber Judgments according to speaker indicate, as did the listeners, 
that the speakers varied widely in the degree of speech deterioration evidenced 
in the sample utterances. Listeners and transcribers agreed that speaker C was 
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TABLE 2: Rank correlation between transcribers and listeners. 

Transcription Transcriber Listener 



Utterance 


Score 


Rank 


Rank 


spring 


4.5 


1.5 


2 


stars 


4.5 


1.5 


1 


scissors 


4 


3,5 


12.5 


school 


4 


3.5 


15 


squirrel 


3.5 


5.5 


10 


watching 


3.5 


5.5 


23 


spider 


3 


8 


3 


whiskers 


3 


8 


18 


scratch 


3 


8 


6 


letters 


2.5 


10 


20 


mouse 


2 


11.5 


28.5 


string 


2 


11.5 


7.5 


dlshers 


1.75 


14 


23 


snowballs 


1.75 


14 


7.5 


giraffe 


1.75 


14 


10 


blocks 


1.5 


18 


26 


brushing 


1.5 


18 


27 


bicycles 


1.5 


IS 


23.5 


grapes 


1.5 


18 


16 


smoke 


1.5 


18 


4.5 


smeplng 


1 


23 


4.5 


sleeping 
It's 


3. 


23 


18 


1 


23 


10 


kids 


1 


23 


25 


splashing 


I 


23 


21 


telephone 


.75 


26 


12.5 


knife 


.5 


28 


35 


swinging 


.5 


28 


23 


shaving 


.5 


28 


31.5 


table 


0 


34 


14 


pajamas 


0 


34 


18 


girl 


0 


34 


31.5 


bird 


0 


34 


34 


fixed 


0 


/ 34 
: 34 


31.5 


birthday 


0 


31.5 


mother 


0 


1 34 


37.5 


cans 


0 


34 


36 


peanut 


0 


34 


37;5 
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the most affected, and that speaker B was not affected. As demonstrated In 
Table 3, speakers C and D were both affected, speakers F and G somewhat less 
affected, and speakers E and A were Judged to have very little deterioration of 
speech • 



TABLE 3: Speaker variation as judged by transcriber agree*- 
ment of no distortion. 



Speaker 


B 


E 


A G 


F D C 


Z Utterances 










Judged Not 


100 


89 


82 74 


58 55 45 


Distorted 











DISCUSSION 

Several results emerged from this perceptual study of the effects of bilat- 
eral mandibular nerve block upon speech. First, the effect was found to be sub- 
tle, limited, and manifest only In rapid, connected speech. In an array of 
utterances heavily weighted with consonant clusters, deterioration of articula- 
tion was noted by listeners In surprisingly little of the data. Transcribers 
agreed upon distortions in only 20% of the data (17% if the unaffected speaker 
is Included), agreeing that there were no perceptible distortions in 67% of the 
utterances (71% when the unaffected speaker is Included). 

The effect was discovered to be limited to certain phonemes. It would be 
distorting the facts to report that the effect was largely with the fricatives, 
because although /s/ was by far the most common phoneme to deteriorate, and /z/ 
and /// also underwent changes, there was seemingly no effect upon /i/ or /6/ 
and very little upon /f/ or /v/. The affricates were affected, but the plosives 
suffered very little, with only /t/ noticeably slurred. The nasals were not 
noticeably affecUea. There was some distortion of /r/ and /I/, but the most 
consplcuous.ef feet remained the /s/. 

Finally, the effect was found to be highly variable across subjects, a 
finding not mentioned in previous reports. Although all subjects reported com- 
plete loss of sensation in the anterior two-thirds of the tongue, the effects on 
speech ranged from completely unaffected to markedly affected. As one can see 
from referring to Table 3, the subjects varied from no effect to distortions in 
55% of the utterances sampled, with significant variation among the subjects be- 
tween the two extremes. 

The high degree of Intelligibility of all of the speakers in tills Investiga- 
tion gives some weight to the theory that skilled speech may be largely under open 
loop control. At least it can be concluded that with only one sensory channel 
inhibited in its function, the motor sequencing of speech remains jessentially in- 
tact « Skilled speech remains highly intelligible whether under conditions of 
auditory masking of one's own speech or conditions of oral sensory nerve block. 



165 



It may be that the systems used to monitor our own speech, specifically audition 
by air and bone conduction, vision, proprioception, and taction, are of primary 
Importance during the learning of speech. After the process of speaking becomes 
relatively automatic and facile, we may monitor less and switch from one channel 
to another as we monitor. 

It Is unclear why there Is such variability of nerve*-block effect among 
subjects. It might be a difference In muscle use. It might reflect differences 
In the nerve block In sensory versus motor effects. Another possibility Is 
Individual variation In dependence upon sensory or auditory feedback. Some 
speakers may have developed a more open loop speaking system than other speaker 

An Inherent variable In these Investigations Is the nerve-block Injection 
Itself. Despite subjective reports of loss of sensation, there are probable 
variations In depth of anesthesia and In the specific nerves affected* It would 
be advisable In future Investigations to use more sophisticated methods of test- 
ing loss of sensation, both taction and klnesthesls. Electromyography should 
be used. In addition, to check motor function. The extent of the variability of 
speech effect after nerve-block anesthesia should caution researchers to avoid 
generalizations about the deterioration of articulation with sensory deprivation 
In the oral area. 
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Oral Feedback, Part II: An Electromyographic Study of Speech Under Nerve- 
Block Anesthesia 

Gloria Jones Borden, Katherine S, Harris, and Lorne Catena 



Electromyographic recordings were made from the lip, tongue, and 
certain suprahyoid muscles of four normal adult speakers under nor- 
mal conditions and under conditions of trigeminal nerve-block anes- 
thesia. The mylohyoid muscle and the anterior digastric muscles 
which are innervated by motor fibers from the blocked nerve were 
usually depressed or inactive during the nerve-block condition. The 
assumption that the effects of this traditionally used nerve block 
are purely sensory seems unfounded. Other muscles are either de- 
pressed in activity during the block or more active than normal dur- 
ing the block. The amplitude of EMG recording depends upon depth and 
symmetry of anesthesia and upon the idiosyncratic reaction of the 
subject. Changes in muscle activity during the nerve block extend 
even to those muscles whose sensory and motor innervations cannot be 
affected by the block. Therefore, the effects observed indicate a 
more central effect or some compensatory reorganization. 



Haskins Laboratories, New Haven, Conn., and City College of the City 
University of New York. 

-H- 

Haskins Laboratories, New Haven, Conr.. , and the Graduate Division of the 
City University of New York. 

1 1 1 

' University of Connecticut Health Center, Farmington, Conn. Currently, 
Southern Illinois University, Edwardsville. 

Acknowledi^gient : Part of * this article summarizes a portion of a doctoral 
dissertation by the first author completed at the Graduate Division of the 
City University of New York under the direction of Katherine S. Harris (1971). 
The authors gratefully acknowledge the assistance of Victor Caronia, D.D.S., 
of Columbia School of Dental and Oral Surgery and Fredericka Bell-Berti of 
Montclair State College. Dr. Robert Ringel of Purdue University administered 
the sensory tests in Experiment II and offered many helpful suggestions 
throughout the work. Indispensable to these studies were Dr. Masayuki 
Sawashima and Dr. Hajime Hirose of the Faculty of Medicine, University of 
Tokyo, who inserted the electrodes.* This reseairch was supported in part by a 
grant from the National Institute of Dental Research to Haskins Laboratories. 

[HASKINS LABORATORIES: Status Report on Speech Research SR-34 (1973)] 



167 



A series of studies during the 1950s and '60s dealt with the subject of 
the role of tactile feedback in speech. It was found that bilateral mandibular 
and infraorbital injections of anesthesia increased the number of judged errors 
in articulation of adult speakers (McCroskey, 1958; Ringel and Steer, 1963). 
The speech distortions were found to be subtle and were most evident in the pro- 
duction of fricatives and affricates (Scott, 1970; Borden, 1971; Gammon, Smith, 
Daniloff, and Kim, 1971). It was assumed by the investigators that the speech 
effect was primarily due to decreased sensory feedback as a result of blocking 
oral sensation from the tongue via the lingual nerve. A phonetic analysis of 
the speech effect under anesthesia revealed two factors which prompted further 
investigation; first was the variability of effect among speakers, with some 
subjects unaffected by the nerve block, although oral sensation was reported to 
be lost, and the second factor was the predominance of articulatory distortions 
among the sibilants and affjricates, especially /s/ in consonant clusters, in 
those subjects vrho were affected (Borden, 1971). It was decided to study elec- 
tromyographically the contraction of some of the muscles thought to be impli- 
cated in linguaL movement under conditions of nerve block and under normal con- 
ditions* 

Four separate electromyographic (EMG) experiments were conducted in an 
attempt to find out what happens to certain suprahyoid and tongue muscles as 
subjects speak under conditions of trigeminal nerve block. ^ 

FIRST ELECTROMYOGRAPHIC STUDY 

Since the nerve block seemed to produce an /s/ effect, muscles which are 
thought to contribute to tongue elevation were reviewed (Van Riper and Irwin, 
1958; Hirano and Smith, 1967; Zemlin, 1968). The muscles which were accessible, 
clearly identifiable, and of interest for this study were the genioglossus, 
geniohyoid, mylohyoid, and the anterior belly of the digastric muscles. The 
orbicularis oris was included as a reference (Figure 1). 

Method 

The monopolar electrodes used were DISA concentric needle electrodes with a 
diameter of .45 mm. Needle placement was made through the cutaneous tissue 
under the chin to the depth required. Correct placement was checked by observa- 
tion of an oscilloscope while protruding the tongue for genioglossus activity, 
saying "ta" for geniohyoid activity, lowering the mandible for digastric activ- 
ity, iixid saying **ka** for mylohyoid activity. Correct placement was checked 
periodically throughout each run. 

The subject for the firr?t experiment was a normal adult speaker. There 
wer ! two experimental conditions — without nerve block, and with bilateral 



These studies were conducted over a substantial period of time, during which 
electrodes, insertion techniques, and data analysis were substantially altered. 
In particular, the first experiment, in 1969, was performed under circumstances 
which permitted only relatively gross statements to be made about the results, 
^rsertion techniques for the intrinsic tongue muscles were developed just 
before the third experiment was performed. 



mandibular blocks. A total of 7.5 cc of 2% Xylocalne was Injected by a dentist, 
3 cc In each side and an additional 1.5 cc on one side. The technique was simi- 
lar to that used by McCroskey (1958), the model for all previous studies. A 
partial run was recorded with a medial nasopalatine block of 1 cc and an anterior 
palatine block of 2 cc added, but this part of the study was not analyzed, as the 
speech effects were not noticeably different from the run with the bilateral man- 
dibular blocks alone. It seemed that loss of sensation from the anterior portion 
of the hard palate and the alveolar ridge adds very little to the speech effect 
from the mandibular block. 

For the EMG studies, material was selected from the utterances used In oar 
previous work (Borden, Harris, and Oliver, 1973). Eleven utterances In sentei>ce 

form, using the format "It could be the were used to permit 

normally paced connected speech. Each utterance was represented twice In a 
randomized list of 22 utterances. There were 10 such lists, each Individually 
randomized. Each utterance was spoken 20 times during the course of one run. 
The utterances were: 



It 


could 


be 


the 


snowballs splashing 


It 


could 


be 


the 


cat*s whiskers. 


It 


could 


be 


the 


fixed sweater. 


It 


could 


be 


the 


school blocks. 


It 


could 


be 


the 


thirsty wasp. 


It 


could 


be 


the 


sleeping taxi. 


It 


could 


be 


the 


spider string. 


It 


could 


be 


the 


squirrel nest. 


It 


could 


be 


the 


rooster scratch. . 


It 


could 


be 


the 


spring grapes. 


It 


could 


be 


the 


stove smell. 



The 220 utterances for each run were printed and mounted on large cards which 
were flipped as the subject read them, with equal stress attempted on each of 
the final two words. 

A 16-channel magnetic tape was produced, recording the electrlcol output of 
the muscles. Recordings were monopolar; that Is, the voltage difference was 
recorded between the active tissue of the muscles and the Inactive tissue of the 
ear lobe. Some channels were used for audio signals, such as the utterances pro- 
duced by the subject and the experimenters* comments for record-keeping. Each 
utterance was numbered by a pulse code laid down on the tape and eventually used 
for computer synchronization. 

A visual record of the EMG and audio channels was made for locating and In- 
specting the Individual tokens. Each utterance was represented 20 times during 
each run, and a single point In time, the llne-up point, was selected so that 
all of the tokens of a single type could be averaged by computer for each elec- 
trode. The llne-up point was chosen at a point of particular Interest and 
marked on the simultaneous recording of the subject's audio recording. 
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Each tape was checked with five computer programs: to verify that the code 
pulses were in order, to set the gains of the playback amplifiers at levels 
appropriate for the analog-to-dlgltal converter, to make control tapes of the 
llne-up points and distances from point zero for each utterance, to set each 
EMG channel at the optimum level, and finally to average the data on the control 
tapes. The three runs were hand-plotted (Harris, 1970). 

Results and Discussion 

Inspection of the data revealed that, except for two muscles, the muscular 
activity recorded during speech under nerve-block conditions was similar In 
amplitude to that recorded during the normal condition. However, the activity 
observed on the oscilloscope of the mylohyoid muscle and the anterior belly of 
the digastric muscle after nerve-block Injections dropped dramatically. The 
electrodes were checked and found to be In placu, but as long as the anesthe- 
sia was effective those muscles were. In effect, paralyzed. The speech of the 
subject under nerve block revealed the typical mandibular block effect of dis- 
torted sibilants, the Isl clusters being most prominently affected. For example, 
for the production of the utterance "sleeping taxi," Figure 2 shows the activity 
of the mylohyoid muscle and the anterior belly of the digastric during normal 
and nerve-block conditions. Graphs of all 11 utterances demonstrate the same 
drop In activity of these two muscles. 

A closer look at the anatomy of the Injection area suggests a reason for 
this effect. The mandibular Injection which has traditionally been used for 
these studies deposits half of the solution In the area of the lingual nerve, 
then moves on to deposit the rest of the solution^ In the area of the Inferior 
alveolar nerve. Just before the Inferior alveolar nerve enters the mandibular 
foramen Into the mandibular canal. It gives off the nerve fibers of what Is known 
as the mylohyoid nerve, tha only purely motor component of the otherwise sensory 
Inferior alveolar branch of the trigeminal nerve. The mylohyoid nerve Is motor 
to the mylohyoid muscle anJ to the anterior belly of the digastric muscle, the 
two muscles which dropped Ir. activity during the nerve-block condition. The 
anatomy of the area Is Indicated In Figure 1, Part I. 

The question was whether the Inactivity of either of these muscles could 
have contributed to the noted speech deterioration. If the speech effect Is 
primarily due to sensory loss, ^hen loss of feedback from the tongue-tip region 
would probably be responsible, xf It Is due to motor loss, however, then the 
Inactivity of the anterior belly of the digastric muscle and the mylohyoid 
muscle would probably be responsible. 

The normal function of the anterior belly of the digastric muscle Is to 
open the jaw. EMG data on this muscle, obtained by recording muscle activity 
during simple consonant-vowel-consonant (CVC) utterances, showed no action for 
/l/ and /u/ and a large peak for /a/ (Harris, 1971). Since there was no percep- 
tible speech effect of thie ne.-ve block upon vowels, and since the action of the 
anterior belly would not reasonably be expected to affect the apical gestures 
which deteriorated under the nerve block. It seems unlikely that Its motor loss 
could have caused the speech effects observed. 

The normal function of the mylohyoid muscle was found by both Harris (1971) 
and Smith (1970) to be highest for the production of /k/. Its contraction seems 
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to lift the body of the tongue. In the more complex utterances of the present 
study, It can be seen that the mylohyoid muscle peaked normally. In preparation 
both for the I si consonant clusters and for the velars (Fig^re 3). Notice the 
activity at the beginning of "spring," "spider," and "string," and at the end of 
"grapes" and "string," in the normal condition. The drop in activity of the 
mylohyoid muscle during the nerve-block condition is obvious. The peaks of 
activity under normal speaking conditions, then, coincided with production of 
the segments that were distorted under the nerve-block condition, with the 
exception of the velars* 

The velars were not perceived as distorted in the nerve-block condition. 
The production of /k/ remained intact, as had been reported in all previous 
nerve-block experiments. The explanation may lie in the comparatively gross 
production of /k/ and the fact that listeners accept for /k/ a less precise 
gesture than for /s/. 

It seemed, therefore, that the effective paralysis of the mylohyoid muscle 
might reasonably be related to the speech effect, since, for th5s subject, the 
mylohyoid muscle appears to be Important in lifting and steadying the body of 
the tongue for consonant clusters, especially those with /s/ (Table !)• This 
subject produces I si with the tongue tip down, making it imperative that the 
body of the tongue be raised to produce the friction. Deprived of motor ability 
in the mylohyoid and deprived of lingual sensation, the Isl clusters were dis- 
torted. It is impossible to conclude which of these factors, if not both, is 
responsible for the distorted speech. 

In summary, the clear conclusion of this first EMG experiment was that a 
motor component seemed to exist in what was previously assumed to be a sensory 
dei:rivation. The motor loss was evident in two of the suprahyoid muscles, the 
mylohyoid muscle and the anterior belly of the digastric muscle. One of these 
muscles, the mylohyoid, is normally active for this subject for Isl clusters and 
velars. Since this subject produced Isl with a high dorsum, it is reasonable to 
assume that the motor loss in the mylohyoid muscle may have contributed to the 
speech deter io.*at ion during anesthesia. However, the lack of effect on /k/ 
could not be unequivocally explained. 

SECOND ELECTROMYOGRAPHIC STUDY 

The purpose of the second EMG study was to verify the result of the first 
study that mylohyoid motor loss accoQq>anies the distorted speech daring the 
nerve-block condition, and also to study further the changes in muscle activity 
by comparing the muscle activity in normal speech with the potentials generated 
during nerve block. 

There were two differences in procedure from the first study; first, since 
we wanted to find out if tha motor loss was inevitable under the normal adminis- 
tration conditions of the bilateral mandibular block, the administrator 
(Dr. Catena) tried to avoid heavy infiltration of the mylohyoid nerve, within 
the bounds of the previously described injection technique. Second, we wanted to 
look at block conditions which more nearly corresponded to those used by Scott 
(1970). 
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TABLE 1: Peak values in microvolts for mylohyoid r iscle In first 
experiment during nerve-block and normal conditions. 



Normal 
NB 

msec 



spjrlngrapea 
3A5 155 285 
30 35 20 
(-225) (125) (715) 



ro ostersclratch 
Normal 175 200 310 370 
NB 30 40 40 20 

msec (-775) (-440) (-125) (325) 



Normal 

NB 

msec 



catswhlskjers 

315 355 380 370 Normal 

35 40 40 20 NB 

(-800) (-505) (-140) (200) msec 



f[l xed sweater 
485 210 140 
45 45 15 
(45) (325) (585) 



Normal 

NB 

msec 



th irsty wasfl 
185 310 
30 35 
(-855) (-255) 



schjoolblocks 
Normal 380 400 
NB 50 30 

msec (-145) (640) 



stjavesmell 
Normal 335 355 
NB 30 50 

msec (-215) (325) 



sjul rrelnest 
Normal 215 150 
NB 50 25 

msec (-175) (635) 



Normal 

NB 

msec 



sniowbal Isspla ah ln^ 
415 34Q[ 430 

(-140) (500) (900) 
(/ng/ not plotted) 



Normal 355 
NB 35 
msec (-210) 



srflde r str ing 



300 210 
40 25 
(365) (790) 



sjeep lngta xl 
Normal 425 265 355 
NB 30 40 40 

msec (-155) (300) (635) 



ERIC 



175 



Method 



It was necessary for technical reasons to use a second subject for this 

experiment. The material consisted of 30 utterances in the frame "the 

They were randomized into four lists repeated alternatively four times', making 
16 lists of 30 utterances each. Fifteen of the utterances were chosen from the 
Scott (1970) list in an attempt to observe the muscle changes in the distorted 
speech which might explain the phonetic changes that she had described. The 
other 15 utterances were words selticted from the sentences in the first study 
and from the perceptual study. Tvc runs were produced, the first under normal 
conditions, the second under blocked condition. 

The electrodes were 0*002-in wires hooked to remain in place* Correct 
placement was checked by observing the oscilloscope while lifting the tongue 
for genioglossus activity, tensing the floor of the mouth while relaxing the 
tongue for geniohyoid activity, saying "ka" for mylohyoid activity, opening the 
mouth with jaw effort for anterior belly of digastric acf'vity, saying "pa" for 
orbicularis oris activity, and lifting the head or opening the mouth under 
presFure for sternohyoid activity. The genioglossus and geniohyoid were also 
checked during swallowing, as their activity differs in timing (Hirose, 1971). 
Electrodes were placed in both sides of the mylohyoid muscle and in both anterior 
bellies of the digastric muscle. 

After the normal run, a total of 7.5 cc of 2% Xylocaine was injected into 
the oral region c"^ the subject. A summary of the injections is given in Table 2. 
Details on the teCi.nique used may be found in a standard reference of dental 
anesthesia (e.g., Cook-Waite Labs, 1971). 



TABLE 2: Injections of anesthesia administered in the second EMG study. 



Cranial 
Nerve 


Branch 


Amount of 
Solution 




Location of 
Injection 


Area of 
Sensation 


V (mand.) 


Inf. Alveol. n. 
Lingual n. 


1.5 cc 




side 


pterygomand* 
triangle 


mand* alv* 
ridge, lip, 
gum 

ant. 2/3 
tongue 


V (mand.) 


Long Buccal n. 


• 5 cc 


ea. 


side 


1st molar 


buccal 


V (max.) 


Infraorbital 
Ant. Sup. A.W. 
Middle Sup. Alv. 


.5 cc 


ea. 


side 


infraorbital 
foramen 


upper lip 
alv. ridge 
ant. teeth 


V (max.) 


Nasopalatine n. 


.5 cc 


midline 


post, to 

central 

incisors 


ant. 1/3 
palate 


V (max.) 


Post. Sup. Alv. n. 


.5 cc 


ea. 


side 


palate 3rd mol. 


post. 2/3 
palate 
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A rough check of two-point discrimination was made, and when the experiment-* 
ers and subject were satisfied that sensation was lost In the tongue and the 
palate, Rlngel's (Rlngel, House, Burk, Dollnsky, and Scott, 1970) 55-ltem oral 
discrimination test of 10 plastic forms was administered. When the subject had 
returned to normal, the Rlngel test was again administered. The subject made 
nine errors In normal condition and fifteen errors In the nerve-block condition, 
the difference being errors of shape, not size. Confusion of shape occurred 
three times In normal condition and nln^a times in nerve-block condition. Never- 
theless, the experimenters were surprised that there was so little difference in 
performance on this test. It was noted that the subject used the usual tongue 
manipulations during normal condition but relied on deep pressure against the 
palate when sensation was decreased. This technique was reported as the method 
used by successful subjects in the study on the effect of anesthesia on oral 
stereognosls (Mason, 1967). 

The multlchaaael magnetic tapes produced for each of these runs were ana- 
lyzed in much the same way as the first experiment. There were some refinements 
in the computer programs. A concise description of the analysis procedure is 
reported by Port (1971). 

Results and Discussion _ 

The most conspicuous result of the second EMG experiment was that the sub- 
ject's articulation remained clear during the condition of nerve block. The 
speech sounded as acceptable under the nerve-block condition as under the normal 
condition. The utterances were louder under nerve block and produced with what 
might be described as over-articulation. 

This variability of nerve-block effect among subjects was observed during 
the perceptual part of this series of studies. It is unclear why there was no 
speech effect. It might be a fference in muscle use, as this subject produces 
/s/ with tip of the tongue raised and might not rely on mylohyoid muscle activity 
as much as the first subject, who produces /s/ with' the dorsum of the tongue 
raised, keeping the tip down. Another explanation for the lack of speech effect 
might be a difference in anesthesia, either in amount or in technique of injec- 
tion. 

Following the first EMG experiment, the investigators were particularly in- 
terested in this second study in the activity of the mylohyoid muscle. Since 
there were bilateral placements of electrodes in both the mylohyoid muscle and 
the anterior belly of the digastric muscles, the Investigators had an opportun- 
ity to study the activity on both sides of these muscles. During the normal run, 
before the injections of anesthesia, the mylohyoid and the anterior belly showed 
activity similar to the first subject. The anterior belly peaked for mouth open- 
ing and the mylohyoid for velar gestures and somewhat for the /s/ clusters. 

During the condition of nerve bloc:., hcwver, there was a change in activity 
in both muscles on the right side. The right anterior belj-y of the digastric was 
in all cases less active than normal after anesthesia. The right mylohyoid was 
consistently less active than normal for velar gestures, but for the /s/ clus- 
ters, it was sometimes less active and sometimes more active than normal. The 
gain in activity for the nonvelar gestures offset the loss for /k/. The de- 
creased activity on the right side in this experiment was not as pronounced as 
it had been in the first EMG study, indicating that the attempt on the part of 
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the dentist to avoid the moto^ t.^ ^ohyoid nerve was partially successful. The 
limited effect on the right side was presumed by the Investigators to be the 
result of some Infiltration of the anesthetic In the area of the mylohyoid nerve » 
especially affecting nerve fibers which are motor to the digastric muscle. 

In contrast with the Instances of decreased activity observed on the right 
side of the mylohyoid and anterior belly of the digastric muscles, the left side 
of these muscles was usually more active than normal while the anesthesia was In 
effect. Figure 4 demonstrates the asymmetry of effect. The right peak In each 
of the four graphs represents the labial closing for /p/ In "duckpond.'* It can 
be seen that the right side of both muscles was quite active during normal speech 
but dropped In activity during speech with nerve block. Figure 4 also shows that 
by contrast both muscles on the left side were more active than normal under 
block. 

Figure 5 summarizes the activity for each muscle during the nerve-block con- 
dition relative to Its normal activity. Taking the normal peak amplitude In 
microvolts of each electrode during the central 400 msec around each llne-up 
point as lOOZ, the percentage of normal activity was computed for the peak ampli- 
tude during the nerve-block condition for each utterance. The average of the 30 
percentages Is represented In the flRure. 

In this figure, muscles have been arranged Into three groups — those where 
the bilateral mandibular block might be expected to have a motor effect, those 
where the effects of the bilateral mandibular block should be sensory, and those 
where the effects observed lie outside the field of the bilateral mandibular 
block, though within the field of the other Mocks performed. This arrangement 
Is Intended for comparlsor with the results of Experiments 3 and 4, although It 
Is not logical for the daca of this experiment. The meaning of the term "sensory" 
Is discussed at greater length In the final section. 

Overall, there Is a widespread change In the activity of the muscles 
sampled, both those %ihlch are directly associated with tongue movement and those 
which are not. It seemed desirable to try to separate motor and sensory effects 
of the mandibular blocks. 

THIRD ELECTROMYOGRAPHIC STUDY 

Since the first two EMG studies with nerve block appeared to reflect the 
results of a mixed nerve block, that Is, the Injection of anesthesia apparently 
affected both motor and sensory fibers of the trigeminal nerve. It was consid- 
ered Important to make further attempts to separate these factors. The third and 
fourth Investigations were designed to anesthetize the lingual nerve alone, pro- 
ducing a purely sensory Llock,and to anesthetize the mylohyoid ner^e alone, pro- 
ducing a purely motor block The third EMG study was an attempt to Investigate 
the effects of the lingual n«.rve block. 

Furthermore, since we found that there was a change In EMG output under the 
nerve-block condition even without a speech effect, we wanted to be sure that we 
used only subjects %ihose speech was perceptually distorted under the block con- 
dition. 
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Figure 5: Mean percentages of normal peak EMG amplitudes in microvolts for 
muscles during nerve block. Experiment II. 
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Method 



The 11 sentences used in the first EMG study were repeated 18 times each in 
nine randomized lists by two subjects* Stress was placed on the first key word, 
"It could be the sleeping taxi/' Subjects were selected by conducting a short 
trial run during which four candidates were given the routine bilateral mandibu- 
lar injection of 2% Xylocaine with 1:100,000 epinephrine. Of the three candidates 
who evidenced speech distortions di ring the nerve block, two, both male speakers of 
English, were chosen as subjects. Tests of two-point discrimination of the 
tongue using a Downes aesthesiometer and of oral stereognosis using the National 
Institute of Dental Research forms were conducted during normal and blocked con- 
ditions. By a slight modification of the injection technique an attempt was 
made to block only the lingual nerve using 1 cc Xylocaine with 1:100,000 parts 
epinephrine on each side. During the normal condition subject DL could make 
accurate two-point discriminations at 3 mm in most cases, requiring up to 4 mm 
separation in some instances at the anterior part of the tongue and up to 1 cm 
separation at some points on the posterior part of the tongue. During the nerve- 
block condition, however, DL failed to discriminate accurately in five out of 
eight two-point placements even when point separation reached 1.5 cm. Oral 
stereognosis ability declined also. Eight errors out of 18 were scored during 
the normal condition and 14 errors were scored during the nerve-block condition. 

The second subject PN made few errors of two-point discrimination at 3 mm 
normally but reported no sensation at all 16 placements during the blocked con- 
dition. Three errors of identification of the forms normally were increased to 
13 errors out of 18 possible identifications during the nerve block. The inves- 
tigators presumed success in lingual nerve isolation in the case of DL, as sensa- 
tion was reported lost in the anterior two-thirds of the tongue but remained on 
the lower lip and gingivae. The effect upon subject PN was less clear, as there 
was some loss of sensation in the lower alveolar ridge and lower lip, indicating 
a partial block of the inferior alveolar nerve. 

EMG recordings were made from the superior orbicularis oris, the anterior 
genioglossus, bilateral placements in the mylohyoid, and in the anterior belly 
of the digastric muscles. New in this experiment were electrode placements in 
the superior longitudinal muscle of the tongue. Bilateral insertions were made 
approximately 1 cm from the midline and 1 cm from the tip of the tongue. The 
insertions were superficial, with an estimated depth of 2 to 3 mm. The hooked 
wires were located about 1 cm posterior to the point of insertion. 

The method of recording and analysis of data was the same as for the 
second EMG study. 

Results and Discussion 

Again, results may be described by grouping the muscles investigated accord- 
ing to whether the block effects on them may be considered to be sensory, motor, 
or indirect. The results indicate first, that the nerve block produced a rather 
dramatic effect on the contraction of the intrinsic tongue muscles from which we 
recorded. Subject DL evidenced a drop in activity during the nerve-block condi- 
tion. The superior longitudinal muscle normally peaks for /6/ and /I/. Both 
left and right electrode placement showed decreased activity, as did the genio- 
glossus, another tongue muscle. Subject PN, however, reacted quice differently 
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to the nerve block. Superior longitudinal activity was depressed on the right 
side In a manner similar to the first subject, but the left electrode, In con- 
trast, recorded much more electrical activity during the nerve-block condition 
than during the normal condition. The genloglossui^ muscle was also more active 
than normal. The effect of the nerve block In tongue muscles was generally de- 
pression of activity In subject DL; In subject PN, one side depressed and the 
other side evidenced greater effort under nerve block. 

The nerve block also produces decided changes In EHG activity In muscles 
served not by sensory nerves Involved In this nerve block but by motor nerves. 
The mylohyoid mui^cle, which normally contracts for /k/, showed greatly decreased 
act^.vlty on the left side. Subject PN showed almost total bilateral Inactivity 
of tals muscle for each token of each utterance type. Both subjects showed de- 
pressed anterior digastric activity during the nerve-block condition. 

There Is a change In the activity of a muscle whose Innervation lies en- 
tirely outside the field of the block — the superior orbicularis oris. For sub- 
ject DL It was somewhat depressed In amplitude during nerve block, but for sub- 
ject PN It was much more active. Examples of orbicularis oris activity are 
shown In Figure 6. Changes In the level of activity peaks for /p/ can be seen 
In the block condition. 

When the absolute peak values In microvolts during nerve block are compared 
to the normal peak valued, and the percent of normal Is averaged for each muscle, 
we can see the pooled difference from the normal condition which the nerve block 
produces. Again, only the peaks close to the averaging lineup point were chosen 
for analysis. Figure 7 shows that for subject DL, the nerve block produced a 
consistently depressed state of activity. The general depression extended even 
to muscle fibers that should have been completely unaffected by the block. Sub- 
ject PN, however, has a far more complex pattern of activity over a wide range of 
muscles (Figure 8). 

To summailze the effects of the nerve block In this experiment, the first 
class of muscles, those Innervated by motor fibers from the blocked nerve, were 
consistently depressed or Inactive. Thus, It seems that despite the attempt to 
anesthetize the lingual nerve alone, there Is'evldence of Infiltration of the 
anesthesia. The next two classes of muscles, those presumably associated with 
sensory fibers from the blocked nerve and those which should be Independent of 
the blocked nerve, were sometimes less active, sometimes more active, depending 
upon the side of electrode placement and upon the Idiosyncratic reaction of the 
subject. 

FOURTH ELECTROMYOGRAPHIC STUDY 

If It Is difficult to Isolate the lingual nerve without affecting the motor 
fibers of the mylohyoid nerve » It Is possible perhaps to anesthetize the mylohy- 
oid nerve alone, producing a motor block while leaving the sensory fibers of the 
lingual nerve unaffected. This was the purpose of the final EMG study. 

Method 

The method was a repetition of the third study with the same electrode 
placements^ the same utterance lists, and one of the same subjects, PN. The 
difference was that .5 cc of 2% Xylocalne with 1:100,000 parts epinephrine was 
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Figure 6: Reduced activity of the orbicularis oris during nerve block for one 
subject and increased activity for another subject, Experiment II. 
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muscles during nerve block, subject DL, Experiment III. 
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Figure 8: Mean percentages of normal peak EMG amplitudes In microvolts for 
muscles durl\ig nerve block, subject PN, Experiment III. 
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Injected on each side at the juncture of the lingual mucosa and the floor of the 
mouth at the level of first molar. Data analysis was the same as for the second 
and third experiments. 

Results and Discussion 

The Injection of the anesthesia directly Into the mylohyoid muscle on each 
side produced much more of an et2act on the left side, which was colnpletely In- 
active after the block, than on the right side, which was depressed In activity 
but remained active (Figure 9). The left anterior digastric electrode was mis- 
placed. 

The Intrinsic tongue muscles did not greatly alter activity, although the 
left side of the superior longitudinal was generally less active than normal and 
the right side more active than normal. The orbicularis oris was also somewhat 
more active during the nerve-block condition. 

The subject's speech remained as well articulated as normal. The subject 
was not conscious of any sensory or motor changes as a result of the Injection 
of anesthesia. 

It seems to be as difficult to obtain a bilateral motor effect as It Is to 
obtain a purely sensory nerve block. There wer«* changes In the amplitude of the 
muscles sampled, however, even when there was lir.tle or no sensory loss. 

SUMMARY OF THE EMG STUDIES AND DISCUSSION 

Although the traditional bilateral mandibular nerve block often produces 
dlstortlo.is In some of the gestures of rapid, connected speech, there Is evidence 
that the effect may have both motor and sensory components. This was Indicated 
by th^ total Inactivity of the mylohyoid muscle and the anterior belly of the di- 
gastric muscle In the first study. The second study demonstrated ti>«> possibility 
of compensatory activity coupled with a lack of perceptible effect of the nerve 
block upon the articulation of speech. The f: ilrd study confirmed the finding of 
the motor effect of the nerve block and increased the evidence of compensatory 
reorganization. Furthermore, the results demonstrated nerve-block effects upon 
muscles whose Innervation Is Independent of the nerves Involved. Increased 
activity under nerve block of muscles which are not served by either sensory or 
motor fibers of the anesthetized nerve Indicates either a general reorgarlzatlon 
of activity In an effort to compensate for some motor or sensory loss, or a more 
central effect of th^ anesthesia. It does not seem, from the results of the 
fourth study, that the motor effect alone Is sufficient to distort the speech, 
although the fourth study also shows that there Is some EMG: reorganization with- 
out any evidence of the normal sensory effects of the block. 

At this point. It seems worthwhile to try to reassess the results of these 
experiments in the light of the explanations i^sually offered for the nerve-block 
effect. 

The primary reason for the effect may be motor, as we have previously sug- 
gested (Harris, 1970; Borden, 1972). On anatomical grounds. It Is plausible 
that the block would affect motor Innervation; Indeed, It Is quite difficult to 
make the sensory block while avoiding the motor Innervation of two Of the 
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Figure 9: Mean percentages of normal peak EMG ampllLudes in microvolts for 
muscles during nerve block, subject PN, Experiment IV. 
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muscles, the mylohyoid arr^ the anterior belly of the digastric* However, the 
pattern of affected consonants makes a primary motor cause for the block effect 
unlikely. We would expect that inactivity of the mylohyoid muscle alone would 
make /k/ the most affected consonant; in fact there is general agreement that 
this consonant is spared* Furthermore, as we showed in Experiment fV, a block 
of the mylohyoid nerve will not apparently produce a perceptible speech dis- 
tortion, at least in gross terms. 

The most traditional explanation of the speech effect is that it is a con- 
sequence of decreasing sensory feedback from the oral area — either tactile or 
proprioceptive or both. 

The "tactile" explanation is that a block of the lingual nerve cuts sensa- 
tion from the surface of the tongue, which leads to imprecision in its placement. 
Again, the pattern of affected consonants makes the explanation somewhat implau- 
sible; in this case the consonants /t/, /d/, and /n/ should be maximally 
affected; they are not. Turning to the experiments reported above, the muscles 
most affected should be the superior longitudinal intrinsic muscles of the 
tongue, which lie closest to the numbed lingual surface. There is no evidence 
that their activity pattern is more, or less, affected than that of muscles 
lying deeper in the tongue body, or, indeed, of muscles lying outside the field 
of the block entirely. A simple tactile explanation does not seem tenable. 

Another explanation for the block effect is that it caust^s interference 
with the proprioceptive return from muscle spindles in the tongue. If each 
muscle adjusts to a fixed length based on the return from its own stretch recep- 
tors, as has been described by MacNeilage (1970), then interference with this 
pathway should have serious effects on speech. Traditionally, it had been 
assumed that the lingual nerve ctrried proprioceptive as well as tactile infor- 
mation from the anterior two-thirds of the tongue, because the hypoglossal nerve 
has no sensory root (Blom, 1960). Studies in rhesus monkeys by Bowman and Combs 
(1968) wc Id indicate that nerve fibers from muscle spindles in the tongue do 
course along the hypoglossal netve for part of the way and then cross to join 
some cervical nerves. If this is the case in humans, the block spares proprio- 
ceptive feedback, since the injection site does not lie on the pathway of the 
hypoglossal nerve. If, on the other hand, proprioceptive feedback is carried in 
the lingual nerve, we would expect that the tongue muscles would be affected by 
the mandibular block, but not muscles outside the tongue, as we found in our 
third study* 

Taking these results together, it would appear that any sensory effects of 
the block must be rather general. The system might be respond! .ig to an altered 
pattern of information sent back to the central nervous system with a changed 
motor output which affects muscles whose sensory feedback is normal — that is. 
there is not muscle-specific correction. These changes are most likely to alter 
those consonants which require the greatest degree of articulatory finesse. 

Everyone writing on this effect recently has noted that the effect is re- 
stricted to a small class of consonants. The restricted results of all these 
studies provide us with some insight into the small size of th^ effect. The EMG 
signals may change size radically under the block; they do not seem to change 
their temporal relationship to each other. Changes in relative Mmirig of the 
muscle gestures would produce far more devastating effects on articulation. 
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Recent work by Scott and Rlngel (1971) has shown that the speech of subjects 
under block does not resemble that of a group of dysarthrlc speak^^rs studied by 
Lehlste (1965) and Tikofsky and Tikofsky (1964). Their argument Is that the 
effects cannot be motor, and hence, must be sensory* 

Another possibility which probably should be considered is that the effect 
may be due to an additional factor, a generalized depression of central activity 
caused by the local anesthesia. Drowsiness is a well-known side effect of 
Xylocaine. Pharmaceutical studies indicate that local anesthetics may appear in 
considerable quantities in the blood stream (de Jong, 1968), and an effect upon 
speech is one clinical sign of a rising level of anesthetic in the blood. Fur- 
thermore, it has been shewn that local anesthetics readily cross the blood-brain 
barrier (Usubiaga, Moya, Wikinsky, and Usubiaga, 1967)* It is possible that a 
sligh: loss of central control may relate more directly to the slurring of speech 
than either the motor or sensory effects evidenced at the periphery. The speech 
effect, when it does exist, sounds perceptually very like Mrunk' speech. 
'Drunk' speech is accepted as a consequence of the alcohol having crossed the 
blood-brain barrier to affect the central control of speech* 

Whatever the cause of the nerve-block effect, it remains an important ex- 
perimental technique because it is one of the few experimental means we have of 
altering speech production in normal adult speakers. Further work should be 
directed towards exploring the alternatives of general central effect versus sen- 
sory deprivation. Furthermore, EMG studies should be aimed at exploring other 
blocks to see if the patterns of their effects are similar to those of the bilat- 
eral mandibular block. 
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Laryngeal Control in Korean Stop Production 

Hajlme Hlrose, Charles Y. Lee, and Tatsujlro Ushljlma 



1 Korean there Is a three-way distinction In both manner and place of 
articulation that differentiates nine stop consonant phonemes. Linguists have 
disagreed about the manner classifications, describing them phonetically In 
various ways for Initial position. Thus, Category I Is characterized as voice- 
less, tense, long, strong, forced, and/or glottallzed; Category II id voiceless, 
lax, slightly asDlrated, and/or weak; Category III Is voiceless, heavily aspi- 
rated, and lax according to some phoneticians but tense according to otherp 
(Martin, 1931; Umeda and Umeda, 1965; Abramson and Llsker, 1972). It Is also 
known that the Category II stop t]^lcally becomes voiced xntervocallcai ly« 

Much has been published In an effort to clarify the acoustical and physio- 
logical properties that differentiate these three manner ca^^^^orle^. Among 
those, Llsker and Abramson (1964) inade an acoustical Investigation Into various 
languages and showed that values of voice onset time (VOT) , the temporal rela- 
tion between stop release and onset of glottal pulsing, provide the most useful 
measure for differentiating various conditions of voicing and aspiration In word- 
Inltlal stops. They noted, however, that Koreai: Is peculiar In that the resolu- 
tion of VOT values between Categories I and II Is not clearcut but shows overlap- 
ping values, while Category III Is well separated from the others. Similar observa- 
tions have been reported by others (Kim, 1965, 1970; Han and Welt7tnan, 1970). 

Abramson hcti Ll&ker (1972) later studied the phonetic significance of the 
VOT values from a perceptual viewpoint by gl\*lng a continuum of synthetic VOT 
variants (Llsker and Abramson, 1970) to native speakers for Identification as 
Korean syllables. Their results Indicated that there must be another dimension 
that works with VOT In distinguishing the categories, although the timing of 
glottal adjustments relative to supraglottal articulation does contribute to the 
consonant distinctions* 

Kagaya (1971) Investigated laryngeal gestures In Korean stop production 
closely In a native speaker using flberscoplc observations. He found that there 



Faculty of Medicine, University of Tokyo, Japan; Hasklns Laboratories, 
New Haven, Conn., 1970-1972. 

Department of Linguistics, University of California, San Diego. 

1 1 1 

Hasklns Laboratories, New Haven, Conn.; on leave from the Faculty if Medicine. 
University of Toyko, Japan. 

(RASKINS LABORATORIES: Status Report on Speech Research SR-34 : 1^73)1 



191 



are differences between the stop types In both the time cours^i of glottal w^ith 
and the apparent glottal conditions In the succeeding vowel segment. In p.irtlc- 
ular, the adjustment of the vocal folds was found to be substantially different 
for Category I or "forced" type when compared to the other two. In Category I 
the glottis closed rapidly and there was complete contact of the vocal processes 
before the onset of voicing, while a slight opening still remained In the mem- 
branous portion of the glottis. 

Lee and Smith (1971) measured botli Intraoral and subglottal air pressures 
simultaneously during the production of the three kinds of Korean stops. They 
found that subglottal pressure was higher for Category III, the highly aspirated 
stop, than for the other two categories. They also compared the dynamic 
patterns of subglottal pressure slope for the three categories and found that 
the Category III stop showed the most rapid Increase In subglottal pressure In 
the time period Immediately before the stop release. They concluded that the 
highly aspirated stop was the most "dynaml:" In this respect. 

In recent years, a considerable number of electromyographic (EMG) studies 
of the laryngeal muscles have been reported. Among those, the Hasklns* group 
(Hlrose and Gay, 1972a, 1072b; Hlrose, Llsker, and Abramson, 1972) Investigated 
EMG patterns of the Inttinslc and extrinsic laryngeal muscles for different 
kinds of languages and reported that there was a reciprocal pattern of activity 
between the adductor and abductor muscle groups of the larynx for voiced-voice- 
less and asplrated-unasplrated contrasts. 

The primary purpose of the present study was to investigate electromyo- f 
graphically the ^actions of the Intrinsic laryngeal muscles in production of 
Korean stop consonants. A native speaker of Korean served as the subject. In a 
separate experiment an attempt was made to take fiberoptic motion pictures of 
the glottis of the same subje^ during stop production. 

Method 

1. EMG experiment . One oi the present authors (C.Y.L.), a native Korean 
speaker from Kyung-sang-book-dc , Si?ng-Ju-goon, Hwa*-dog-myun, was the subject in 
this experiment. He read^ randomized lists of test sentences 16 times each. In 

each sentence a test word was embedded in the frame /ikasi ita/ (This is 

)• In the first part of the experiment, test words in the form of CVl with 

a short, unstressed vowel were used. The consov.ant (C) was labial, dental, or 
veldr, and the vowel (V) was /i/, /a/, or /u/. About half of the 27 phonemic 
sequences thus formed for the test words were nonsense syllables, but they did 
not violate any phonological constraints of the dialect in question. In the 
second part of the experiment. Lest words of the form VCVl were used.^ 

EMG recordings w *e made using hooked-wire electrodes. The electrodes were 
Inserted into the intecarytenoid muscle (INT) perorally by indirect l^^ryngoscopy 
using a curved orobe, l* t a percutaneous approach was employed for insertion 



It was noced in listening tests and oscillographic observations that the vowel 
/i/ after /s/ in :he frame sentence /Ikasi/ was ccnsistently devoiced by this 
sutject, yielding [Igas^] as the pronunciation. 
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into the vocalls (VOC)^, the lateral cricoarytenoid (LCA) , and the cricothyroid 
(CT) muscles. Insertion into the posterior cricoarytenoid (PCA) was also 
attempted perorally but proved unsuccessful because of anatomical difficulty. 
EM6 activities from the orbicularis oris (00) and the sternohyoid (SH)3 were 
also recorded percutaneously . A more detailed description of the electrode pre* 
paration and insertion techniques can be found in previous reports (Hirose, 
1971a; Hirose^ Gay, and Strom, 1971). 

EHG signals were recorded on a multichannel data recorder simultaneously 
with acoustic signals and automatic timing markers. The signals were then re- 
produced and fed into a computer after appropriate rectification and Integration* 
The EHG signal from each electrode pair was averaged over more than 14 selected 
utterances of each test sentence with reference to a llne-up point on the time 
axis representing a predetermined speech event. In the present experiment, the 
release of stop closure in each test word was used for the line-up* The data- 
recording and computer-processing systems used in the present experiment are 
described in more detail b^ Port (1971). 

2. Fiberoptic observation . Separately from the EMG experiment, motion 
pictures of the glottlis of the same subject were taken using a fiberscope at a 
film rate of 60 frames per second. As the test utterances isolated nonsense 
/CVCV/ and /VCV/ words were used, where /V/ was always /i/. Appropriate frame 
sequences for each type of stop were then examined frame-by- frame with special 
reference to the time course of glottal width as measured at the vocal processes. 



Figure 1 illustrates averaged EMG curves of VOC and CT for the three differ- 
ent bilabial stops in word-initial position. The zero on the abscissa marks the 
line-up point for averaging, which corresponds to the release of the stop 
closure. The time axis is marked off every 100 msec* In each type three curves 
are superimposed for the postconsonantal vowels /i/, /a/^ and /u/, which are 
represented respectively by thin, thick, and dashed lines. 

We note in Figure 1 that the curves are similar for a given stop type, i.e*, 

those for different postconsonantal vowels coincide fairly well. This holds true 

both for VOC and CT as shown in the figure, and also for INT and LCA, which are 
not shown here. 

Flgv\re 1 also shows that the pattern of CT activity is more or less constant 
for the three stop types, being characterized by two peaks separated by a tem- 
poral suppression in the middle portion of the test utterance* 

Figure 2 compares the activity of INT and VOC for the test utterances con* 
taining the three types of bilabial stops in word-initial position followed by 
the vowel /i/* The INT activity starts to increase before initlFition of the test 
utterance and, after reaching its peak near the beginning of the first vowel [1], 



Recordings from VOC were not obtained in the session using VCVl type test words. 
^The data for SH will not be discussed in this report. 
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the activity decreases for the voiceless segment [S^]. In the case of Category 
III /p^/, INT activity continues to be suppressed and then steeply increases 
again near the stop release. In Category II /p/ and Category I /P/, INT 
activity shows a slight increase afte/ a marked suppression for the [%^] segment 
and then stays on a moderate level. 

The activity of VOC also starts to increase before the initiation of the 
utterance but there is a slight delay in timing when compared with that of INT. 
The pattern for /pV and /p/ is characterized by two peaks, with suppression 
between the peaks possibly reflecting the voice cessation around the word 
boundary. For /P/, on the other hand, VOC shows a marked increase in activity 
immediately before the release. 

Figure 3 Illustrates the patterns of INT, LCA, and 00 activities for the 
three bilabial stops* in word-initial position (left) and in word-medial position 
(right). The postconsonantal vowel is HI for all cases. For the test words 
with the stop consonant in word-medial position, the onset of the phonated vowel 
segment at the beginning of the word after the devoiced vowel of the carrier was 
taken as the line-up point. 

The general pattern of LCA activity is similar to that of VOC in that LCA 
also shows increasing activity before the stop release in the case of /P/, 
regardless of the position of the consonant, while it shows two separate peaks 
for both /p^/ and /p/» There is no discernible difference in the pattern of 00 
activity among the three different stop types when the consonants are in word- 
initial position. In word-medial position, however, 00<activity is definitely 
less for Category II /p/, here pronounced [b], than for the other two. 

The activity of INT for test utterances with the stop consonants in word- 
medial position increases before the initiation of the utterance and shows a 
peak approximately 300 msec before the line-up point, followed by a steep decline 
appropriate for voiceless [6]. The activity increases again approximately 100 
msec before the line-up point probably for the vowel segment that precedes the 
stop closure period and, after reaching the second peak near the line-up, it is 
then suppressed for the consonantal segment. The suppression is roost marked for 
/p^/ in both degree and duration. For /p^/, there is a steep elevation of 
activity after the period of suppression. For /P/, INT suppression reaches its 
greatest point earlier than for /p^/» and is followed by a slight elevation 
toward a moderate level of activity. For /p/, which is voiced in word-medial 
position, INt activity gradually decreases after the peak near the line-up point 
and then sustains a moderate level of activity. 

In the case of LCA, the pattern for /p^/ also appears to be characterized 
by a marked suppression followed by a steep increase. For /p/ in word-medial 
position, LCA activity stayj moderate for the consonantal segment as well as 
for the subsequent portion of the test utterance. There is a definite increase 
in LCA activity for /P/ In word-medial position approximately ISO irsec after the 
line-up, corresponding roughly to the stop closure period. 

Figure 4 shows time courses of the glottal width for representative utter- 
ance samples by the same subject of the three bilabial stops in absolute initial 
position. The rectangles represent /P/ (Category I), the circles /p/ (Category 
II), and the triangles /p^/ (Category III). Filled rectangles and circles 
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indicate that vocal fold vibration was observed in that frame. The zero on the 
abscissa marks the release of stop closure. 

The figure shows that the glottis begins to close earlier relative to the 
stop release in Categories I and II » while it stays wide open until the release 
in Category III. In other words, it seems that there is a considerable differ- 
ence in glottal width during the consonantal closure period between Category 
III and the other two. When we compare Category I and Category II » it appears 
that in Category I the ^^o^^^s closes somewhat more rapidly and a complete con- 
tact of the vocal processes is found before the stop release, while in Category 
II the glottis closes gradually. 

The results for dental and velar stops were essentially comparable to those 
obtained for bilabial stops in both EMG and fiberoptic experiments. 

Discussion 

The experimental results of the present study clearly suggest that coordi- 
nated actions of the laryngeal muscles characterize the different types of 
Korean stops. 

The "aspirated** stop (Category III) appeared to be characterized by suppres- 
sion of all the adductor muscles of the larynx immediately preceding the articu- 
latory release* This suppression was always followed by a steep increase in 
activity which seemed to correspond to the rapid closure of the glottis after 
stop release, as noted in fiberoptic observations both in this study and else- 
where (Kagaya, 1971)* 

The pattern of INT activity was almost the same for Category I and Category 
II stops in word*-initial position. It has been observed in previous studies 
that INT actively participates in the adduction of the vocal fold in speech ar- 
ticulation (Hirose, 1971b; Hirose and Gay, 1972a). The pattern of INT activity 
is usually known to be reciprocal with that of PCA, the only known abductor of 
the vocal fold not examined in the present study. In the phonetic environment 
examined here, INT activity was found to be markedly suppressed for the voice- 
less segments of [%i) (where the glottis seemed to be wide open) after an initial 
increase for the voiced segment in the preceding cortext. The glottal width 
during the stop closure period has been found to be narrower in Category I and II 
stops than in Category III. In the light of this fiber scopic finding it is 
expected that in the case of Categories I and II the glottal width during the 
stop closure becomes narrower than for the preceding voiceless [%\]* A slight 
increase in INT activity observed in Categories I and II immediately before the 
onset of the articulatory closure seems to indicate the active narrowing of the 
glottis described above. 

The patterns of VOC and LCA activity were most characteristic for Category 
I. Both muscles, VOC in particular, showed a marked increase in activity before 
the stop release in Category I, which presumably resulted in an increase in 
inner tension of the vocal folds as %iell as in constriction of the glottis 
during or immediately after the articulatory closure. It should be reasonable 
to assume that these activity patterns of VOC and LCA are the physiological 
correlates of the Category I stop associated with the subjective impression — 
possibly including laryngeal sensations in production — of **laryngealization" or 
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**glottallzatlon" which has often been claimed for this type of stop (Abramson 
and Llsker, 1972; Ladefoged, 1973). On the basis of fiberoptic and acoustic 
data, Fujlmura (1972) stated that Category I stops were expected to show a 
marked activity of VOC. The present EHG result seems to support this prediction. 
We cannot be certain, however, that these findings on VOC and LCA activity In 
Category I stops should be taken as physiological evidence of so-called "tense- 
ness** of Category I. 

It Is quite evident, at any rate, that the patterns of VOC and LCA activity 
are different from that of INT in the production of Category I stops. Our previ- 
ous studies Indicated that the pattern of VOC and/or LCA activity often differed 
from that of INT In laryngeal artlculatory adjustments. For example^ LCA and 
VOC always show marked activity for glottal stop production, while INT does not 
(Hlrose and Gay, 1972b). In our preliminary EHG experiment on Danish subjects, 
VOC and LCA usually showed a marked Increase In activity for the production of 
Danish st^d , while INT did not show any activity related to the st^d production. 
In the light of these findings,, it seems reasonable to assume that VOC and LCA 
play a different role from INT in certain types of laryngeal adjustments. In 
other words. It can be assumed that there Is a functional differentiation of 
the adductor muscles of the larynx, although INT, VOC, and LCA are often grouped 
together as adductor muscles In the classical sense. 

It Is also Interesting to note that the pattern of 00 activity was different 
between word-lnltlal and medial positions; I.e., It was markedly low for Category 
II stops In word-medial position, while It was almost the same for all the stop 
types In word-lnltlal position. One may argue that the lower 00 activity for 
Category II stops in word-medial position could be related to the so-called 
"laxness." The eMct nature of the "tense- lax" feature has not been well docu- 
mented. In particular, its physiological correlates are still ambiguous, 
although there have been several reports claiming that tenseness exists in 
reality in terms of overall tensing of the speech muscles or of a stronger or- 
ganic pressure (Fischer- J^rgensen, 1968; Maldcot, 1970). In any event, it should 
be stressed that this difference In 00 activity is observed only in word-medial 
position where the occlusion of a Category II stop is completely voiced; in word- 
initial position 00 activity is not distinctive. It is inappropriate at this 
point to come to a conclusion about the reality of "tense-lax" opposition as a 
universal feature in many different languages. However, at least for Korean 
stops, the laryngeal artlculatory adjustment is not limited in a simple dimension 
of adduction-abduction of the vocal folds: another dimension, represented by VOC 
activity for example, also must be taken into consideration. 
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Patterns of Palatoglossus Activity and Their Implications for Speech 
Organization* 

F. Bell-Bert i"^ and H. Hlrose"^ 



While the levator palatini has been established as the muscle primarily 
responsible for soft palate elevation, no antagonist muscle has been found to be 
responsible for soft palate lowering* Some Investigators (Frltzell, 1969; 
Lubker, Fritzell, and Llndqulst, 1970) have suggested that the palatoglossus Is 
a muscle serving this palate'-lowerlng function. Other Investigators (Bertl and 
Hlrose, 1971), who have not found evidence supporting this hypothesis, have 
found Instead that palatoglossus activity corresponds to tongue body movements: 
velar consonant and back vowel articulations. 

We shall report on electromyographic (EM?) data obtained from two subjects: 
the first, LJR, Is a native speaker of American English; the second subject, BG, 
is a native speaker of Swedish and Is the same subject whose data were reported 
by Lubker, Frltzell, and Llndqulst (1970) In a study of nasal articulation In 
Swedish* The test utterances were nonsense dlsyllables designed to determine 
the effect of vowel color, and the place and matine'r of stop consonant articula- 
tion on palatoglossus activity. For example, one utterance Is /fapmap/. The 
data were processed using the Haskins Laboratories* EHG system. 

Results and Discussion 

We will begin our discussion by examining the EHG potentials associated 
with labial nasal articulations. 

The zero point in all the figures occurs at the acoustic boundary between 
the oral and nasal stops. The acoustic signals are represented above each 
figure. Of course, the EMG signal associated with an acoustic event precedes 
that event. 



*Paper presented at the 85th meeting of the Acoustical Society of America, 
Boston, Mass., April 1973. 

^Haskins Laboratories, New Haven, Conn.; Montclair State College, Upper 
Montclair, N. J.; and The Graduate School, The Clt/ University of New York. 

Faculty of Medicine, University of Tokyo, Japan. Visiting researcher, Haskins 
Laboratories, 1970-1972. 
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There Is no palatoglossus activity associated with labial nasal articula- 
tion for subject LJR — no activity Is seen for /flmplp/ or /flpmlp/ (Figure 1). 
Subject BG, whose data are displayed In the lower half of Figure 1, does show 
palatoglossus activity for labial nasal articulation. For subject BG, the ac- 
tivity peak occurs earlier in /flmplp/ (-200 msec) than in /flpmlp/ (-110 msec): 
the peak shifts in the direction of the nasal, occurring earlier for /flmplp/ 
than for /flpmlp/. 

No clear activity is observed for any of the vowels for subject LJR (Figure 
2). Subject BG again presents activity for the labial nasal in /flmplp/ (Figure 
2). In addition, activity is evident for the vowels in /fconpap/ and /fumpup/. 
The greatest activity occurs f r both of the vowels in /fumpup/. 

Peaks are observed for the velar oral stop in /fakmap/ near -250 msec for 
both subjects (Figure 3). In addition, subject BG has a second peak at -70 msec 
for the labial nasal, /m/, in /fakmap/. Both subjects show peaks at -180 msec 
for the velar nasal, /n/f in fanpap/. 

In summary, palatoglossus activity for subject LJR corresponds essentially 
to velar articulations. Activity is observed only when the stop is velar — 
regardless of the oral or nasal manner of articulation. Subject BG, however, 
presents palatoglossus activity for all nasal articulations. He also shows 
activity for back vowels, with greater activity for /u/ than /a/. Subject BG 
also shows activity a&^sociated with velar articulations: /k/ and /n/. 

We have concluded, therefore, that palatoglossus activity is primarily 
associated with tongue body movements, but may be implicated in the nasal manner 
of articulation in some speakers. We may not yet specify whether these differ- 
ences in palatoglossus function (i.e., tongue body vs. nasal gestures)^ are lan- 
guage-specific or idiosyncratic in nature: our Swedish speaker was the same 
individual whose<^ata were reported by Lubker, Fritzell, and Lindquist (1970). 
We await further cross-language studies to determine the cause of these differ- 
ences. We may say, though, that no universal mode of nasal articulation, corre- 
sponding to the universal mode of oral articulation found in levator patatrltd 
function, may be specified: that is, while palatoglossus function for nasal 
articulation may exist for speakers of some languages, this function does not 
occur for ^11 speakers of all languages. 

Subject BG, the Swedish speaker of this experiment, shows palatoglossus 
activity for vowels, nasals, and velar consonants. One additional finding for 
this subject, BG, is of interest. 

When a consonant with palatoglossus activity follows a vowel with palato- 
glossus activity, the two peaks merge and form one broader peak (the /-um-/ in 
/fumpup/) (Figure 4). When this consonant precedes a vowel with palatoglossus: 
activity, two separate peaks are observed, the /-mu-/ of /fupmup/ (the dotted 
line). The peaks in /fumpup/ do not merge as a consequence of time-smear 
effect: the nasal in /-mu-/ (/fupmup/) is shorter than the first vowel in 



Data from other speakers of American English (Bell-Bertl, 1973) show palato- 
glossus activity only for vowels. 
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I^MXs^l (fumpup). If the effe^^t were due to temporal overlap of the EMG signals, 
the peaks would merge in /-mu-/ utterances, where the beginnings of the involved 
phones are closer than they are in /-urn-/ utterances. 

Two possible explanations emerge for this difference in the pattern of 
palatoglossus activity in vowel*-nasal and nasal-vowel combinations. The first 
is that momentary relaxation of the palatoglossus is required to facilitate 
initiation of palatal elevation by the levator palatini for the production of 
an oral /u/, an articulation requiring a fairly tight velopharyngeal seal. 

Another more tantalizing, but highly speculative, explanation is that this 
pattern is a reflection of some aspect of syllabic organization: that motor 
commands may be merged for VC sequences but not for CV sequences. This specula- 
tion is based on limited but highly reliable data: the pattern occurs for all 
cases having palatoglossus activity for both the vowel and consonant members of 
the CV and VC pairs (including oral velar stops). It may reflect the generally 
greater constriction of the oral cavity for consonants than for vowels: activ- 
ity may increase through a vowel into a consonant but must decrease to permit a 
reduction of oral cavity constriction for a vowel following a consonant. 

While no final statement may be made about the cause of this difference in 
activity patterns, the observation warrants further investigation. 
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ABSTRACT 



Aspects of Intonation In Speech: Implications from an Experimental Study of 
Fundamental Frequency* 

James E» Atkinson 



In a study of Intonation In American English several acoustic and physio- 
logical correlates of the fundamental voice frequency (Fo) vere investigated. 
The goal was to determine the linguistically relevant acoustic and physiological 
aspects of Fo and to relate these within a unified, phonetic feature theory. 
Several apsects of Fo production were studied. 

Inter- and Intra-Speaker Fo Variability 

A measure of the amount of Fo variability was obtained for various repeti- 
tions by a single speaker and compared with that from several different speakers. 
The results show the Inter- and Intra-speaker variabilities to be of the same 
order of magnitude. A detailed study of the type of variability and Its proba- 
ble causes was presented to determine Its perceptual relevance. The results 
offer strong evidence that nothing finer than a binary distinction Prominence) 
can be made In terms of Fo. Phonetic theories which demand many fine Fo distinc- 
tions seem to be overspeclf led. 

Physiological Factors 

An electromyographic (EMG) study of several laryngeal muscles was conducted 
using hooked*wlre electrodes (Hlrose» 1971) and the Hasklns Laboratories' EMG 
data system described by Port (1971)* The muscles Investigated were: vocalls, 
cricothyroid, lateral cricoarytenoid, sternothyroid, and sternohyoid. These 
were studied for several ty;pe8 of ^ientence Intonation. 

Subglottal, transglottal, JSind oral air pressure were measured for the same 
utterances. Subglottal pressure was obtained using a tracheal catheter. In 



^Dissertation submitted in partial fulfillment of the requirements for the 
degree of Doctor of Philosophy, University of Connecticut, Storrs. 

^Naval Underwater Systems Center, New London, Conn. 

Acknowledgment ; Partial support for this research came from the Naval Under- 
water Systems Center, New Lon'^on, Conn., and from grants to Hasklns Laborator- 
ies, New Haven, Conn* 
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addition, lung volume and alr-flov rates were obtained for these utterances. In 
this way a detailed study of all of the physiological factors known to affect Fo 
was possible. 

Intrinsic Fo In Vowels 

The above physiological measures were also obtained for various steady- 
state vowels In an attempt to explain the phenomenon of Intrinsic Fo differences 
between vowels. Traditionally this has been explained In terms of mechanical 
Interaction between the tongue and larynx* although recent clneradlographlc evi- 
dence contradicts this. The results of this study supported an explanation 
offered by Flanagan and Landgraf (1968) which accounted for these differences In 
terms of source-system acoustic coupling effects. 

Computer-Implemented Correlation Analysis 

A correlation analysis technique was developed to allow a detailed look at 
the physiological factors controlling Fo and at their Interactions. The results 
Indicated that various factors may be. Involved In controlling Fo depending upon 
the type of utterances, and the results suggested a **modal theory** of lar3mgeal 
control employing two different laryngeal states, which appeared to be mediated 
by the sternohyoid muscle In this study. 

Taken as a whole, the results of this study show an essential Interaction 
between the physical constraints and capabilities of the vocal apparatus and the 
prosodlc features. These features (Prominence and Breath-group) seem to be 
structured and Implemented to take maximum advantage of the normal vegetative 
process of respiration, and to minimize the number of additional adjustments 
from this state. Most simple declarative statements follow this pattern (denoted 
**-Breath-group**) , and the single most important factor In controlling Fo for these 
utterances Is subglottal pressure, although laryngeal adjustments also may play a 
role. The evidence shows that utterances like yes-no questions (denoted **+Breath- 
group**) are **marked** In the sense that special respiratory and laryngeal adjust- 
ments are required. All of this supports the notion that the fundamental unit of 

tonatlon Is the breath-group. Its function Is to help segment the nearly con- 
tinuous train of speech sounds Into linguistic units, and to denote certain fea- 
tures of the underlying constituent structure. The phonetic/phonological fea- 
tures (Prominence and Breath-group) are **slgnallng units** which can be Imple- 
mented In various ways. The results show that just as the segmental phonemes are 
encoded into syllable-sized units (see Mattlngly and Llberman, 1969; Llberman, 
1970), so the prosodlc features are encoded and must be perceived In terms of the 
entire breath-group, and not as discrete sequential features. This study sup- 
ports the notions of a motor theory of speech perception (Llberman, Cooper, 
Shankweller, and Studdert-Kennedy, 1967) and of an archetypal breath-group as 
suggested by Llaberman (1967). 
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ABSTRACT 



Levels of Processing In Speech Perception: Neurophyslologlcal and Information- 
Processing Analyses* 

Charles C. Wood^ 



The relation between an acous:lo speech signal and Its phonetic message 
appears to be a complex and highly efficient code, based on parallel transmis- 
sion of phonetic Information in the speech signal. Previous experiments have 
suggested that the perception of this "speech code" Involves specialized phone- 
tic decoding mechanisms located In the dominant cerebral hemisphere, mechanisms 
that are not Involved In the perception of aonspeech sounds. This suggestion 
has received additional support from the demonstration that some components of 
a speech signal require specialized phonetic processing for their perception, 
while other components can be processed by the general auditory system alone. 
For example, recent experiments by the author have shown that different levels 
of processing underlie the perception of auditory and phonetic dimensions of 
synthetic speech stimuli. In one experiment, reaction time (RT) data indicated 
that while the auditory dimension could be processed independently of the pho- 
netic dimension, the phonetic dimension could ;iot be processed independently of 
the auditory dimension. In a second experiment, averaged evoked potentials were 
recorded during the processing of the same auditory and phonetic dimensions. 



*A dissertation presented to the faculty of the Graduate School of Yale Univer- 
sity, New Haven, Conn., in candidacy for the degree of Doctor of Philosophy, 
June 1973. TMs dissertation was done in the Department of Psychology under 
the direction of W. R. Goff, R. S. Day, and W. R. Garner. The experimental 
work was carried out for the most part in the Neuropsychology Laboratory of the 
West Haven Veterans Administration Hospital; the synthetic speech stimuli were 
generated and analyzed in the Haskins Laboratories. The thesis is summarized 
here primarily because of its relevance to the speech research program of 
Haskins Laboratories. The content of the thesis will appear in the next regu- 
lar issue of these Status Reports. 

^Now at the Walter Reed Anny Institute of Research, Department of Experimental 
Psychophysiology, Walter Reed Army Medical Center, Washington, D. C. 

Acknowledgment ; This dissertation was supported in par?: by the Veterans 
Administrati on , Public Health Service National Institute of Mental Health 
Grant M-05286 and National Science Foundation Grants GB-3919 and GB-5782 to 
W. R. Goff, and by a National Institute of Child Health and Human Development 
Grant HD-01994 to Haskins Laboratories. 
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This experiment demonstrated that the processing of the phonetic dimension was 
accompanied by neural events in the left hemisphere which did not occur during 
the processing of the auditory dimension. Thus, using very different response 
measures, both experiments suggest that perception of the phonetic dimension 
involved an additional level of processing which was not r»^ uired for the audi*- 
tory dimension. 

The present investigation was designed to substantiate the distinction 
between auditory and phonetic levels of processing made by the initial experi- 
ments, and to provide additional information concerning the nature of the spe- 
cialized phonetic level. Instead of collecting the RT and evoked potential data 
separately, the present experiments employed a single paradigm to obtain both 
sets of 4dt<i. The first experiment completely replicated the RT and evoked po- 
tential reFults for auditory and phonetic dimensions obtained separately in the 
initial experiments. The second experiment provided control data demonstrating 
that the results attributed to the phonetic level did not occur for two auditory 
dimensions. The third experiment showed that the phonetic level is specialized 
for the extraction of abstract phonetic features, not for the detection of par*- 
ticular acoustic features in the speech signal. The fourth experiment suggested 
that while the phonetic level is linguistic in nature, it is not required for 
the processing of all acoustic dimensions that can convey linguistic information. 
Additional analyses of the neurophysiological data demonstrated that the evoked 
potential differences between auditory and phonetic dimensions were not associ- 
ated with differences in: 1) frequency spectra or amplitude distributions of 
the background electroencephalogram (EEG) ; 2) pre-stimulus baseline changes re- 
lated to the contingent negative variation (CNV); or 3) averaged activity syn- 
chronized to subjects* motor responses. Taken together, the RT and evoked po- 
tential data of the present experiments provide a strong set of converging oper- 
ations upon the distinction between auditory and phonetic levels of processing 
in speech perception, and upon the idea that the phonetic level involves spe- 
cialized language mechanisms which are lateralized in one cerebral hemisphere. 




216 



II. PUBLICATIONS AND REPORTS 
III. APPENDIX 



217 



PUBLICATIONS AND REPORTS 



t 



Publications and Manuscripts 

The following three papers appeared in Proceedings of the Seventh International 
Congress of Phonetic Sciences , Montreal, 1971 • (The Hague: Moutoar'-t972) 



Leigh Lisker and Arthur S. Abramson, 366-370. 

Voice Timing in Korean Stops. 

Arthur S. Abramson and Leigh Lisker, 439-446. 

Further Experimental Studies of Fundamental Frequency Contours.. . 
Michael Studdert-Kennedy and Kerstin Hadding, 1024-1031. 



Phonetic Ability and Related Anatomy of the Newborn and Adult Human, Neanderthal 
Man, and Chimpanzee. Philip Lieberman, Edmund S. Crelin, and Dennis H. 
Klatt. American Anthropologist (1972) 74, 287-307. 

Silent Articulation. Katherine S. Harris. Science (1972) 176, 1114-1115. 

Word-Final Stops in Thai. Arthur S. Abramson. In Tai Phonetics and Phonology , 
ed. by J. G. Harris and R. B. Noss. (Bangkok, Thailand: Central Institute 
of English Language, 1973) 1-7. 

A Plan for the Field Evaluation of an Automated Reading System for the Blind. 
P. W. Nye, J. D. Hankins, T. Rand, I. G. Mattingly, and F. S. Cooper. 
IEEE Transactions on Audio and Electroaooustics (June 1973) AU-21 , 265-268. 

Olson's Projective Verse and the Use of Breath Control as a Structural Element. 
Marcia R. Lieberman and Philip Lieberman. Language and Style (1973) 5^, 



The Phi Coefficient as an Index of Ear Differences in Dichotic Listening. 
Gary M. Kuhn. Cortex (in press). 

Hemiretinae and Nonmonotonic Masking Functions with Overlapping Stimuli. 

Claire Farley Michaels and M. T. Turvey. Bulletin of the Psychonomic 
Society (in press) . 

^Phonological Fusion of Synthetic Stimuli in Dichotic and Binaural Presentation. 
James E. Cutting. 

*Laryngeal Control in Korean Stop Production. Hajime Hirose, Charles Y. Lee, and 
Tatsujiro Ushijima 

^Phonological Fusion of Stimuli Produced by Different Vocal Tracts. James E. 
Cutting. 



^Appears in this report, SR-34. 



Glottal Modes in Consonant Distinctions. 




287-298. 
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♦Hemispheric Specialization for Speech Perception In Six-Year-Old Black and 

White Children from Low and Middle Socioeconomic Classes. M. F. Dorman 
and Donna S. Geffner. 

*Oral Feedback, Part 1: Variability of the Effect of Nerve^Block Anesthesia Upon 
Speech. Gloria Jones Borden, Katherine S. Harris, and William Oliver. 

*Oral Feedback, Part II: An Electromyographic Study of Speech Under Nerve-Block 
Anesthesia. Gloria Jones Borden, Katherine S. Harris, and Lorne Catena. 



Reports and Oral Presentations 

*Phonetic Prerequisites for First-Language Acquisition. Ignatius G. Mattingly. 
Presented at the International Symposium on First-Language Acquisition, 
Florence, Italy, September 1972; and at the Society for Research in Child 
Development meeting, Philadelphia, Pa., March 1973. 

Phenomena in Cognitive Psychology. Ruth S. Day. Connecticut Junior Science and 
Humanities Symposium, Yale University, New Haven, Conn., 2 April 1973. 

The following seven papers were presented at the 85th meeting of the Acoustical 
Society of America, Boston, Mass., 10-13 April 1973. 

* Reaction Times to Comparisons Within and Across Phonetic Categories: 

Evidence for Auditory and Phonetic Levels of Processing. 
David B. Pisoni and Jeffrey Tash. 

* The Role of Auditory Short-Term Memory in Vowel Perception. 

David B. Pisoni 

* Effects of Attenuation of One of Two Channels on Perception of Opposing 

Pairs of Nonsense Syllables when Monotically and Dichotically 

Presented. (Retitled for SR-34) 

Susan Brady-Wood and Donald Shankweiler. 

^ Digit-Span Memory in Language-Bound and Stimulus-Bound Subjects. 

Ruth S. Day. 

* Patterns of Palatoglossus Activity and Their Implications for Speech 

Organization. t 

F. Bell-Berti and H. Hirose. 

Degree of Phrasal Stress: A Stable Lexical Feature? 
Jane H. Gaitehby, George N. Sholes, and Gary M. Kuhn. 

A Two-Pass Procedure for Synthesis-By-Rule. 
Gary Kuhn. 

Electromyographic Studies of Articulatory Organization. K. S. Harris. Invited 
paper. Symposium on Organized Function in Craniofacial Dynamics, Interna- 
tional Organization for Dental Research, Washington, D. C, 12 April 1973. 

Evaluation of New Techniques for Research in Oral Muscle Function. K. S. Harris. 
Invited paper. Research Week, Harvard Dental School, Cambridge, Mass., 
24 April 1973. 
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*0n Learning ''Secret Languages." Ruth S. Day. Presented at the Eastern Psycho- 
logical Association meeting, Washington, D C, 3 May 1973. 



Hemispheric Specialization and Individual Differences in Cognition. Ruth S. Day. 
Invited address. Center for Advanced Study in the Behavioral Sciences, 
Stanford, Calif., 11 May 1973. 

Colloquia. Ruth S. Day. Wesleyan University, Middle town, Conn., 25 April 1973; 
Stanford University, Stanford, Tnlif., 11 May 1973. 

*A Note on the Relation between Action and Perception. M. T. Turvey. Invited 

address, Allerton Conference of the North American Society for the Psychol- 
ogy of Sport and Physical Activity, Monticello, 111., 13-17 May 1973. 

Some Aspects of Speech Perception. M. T. Turvey. Hampshire College, Amherst, 
Mass. , May 1973. 

Two Central Processes in Vision. M. T. Turvey. Harvard University, Boston, 
Mass., May 1973. 

Invited Talks. M. T. Turvey, R. S. Day, D. B. Plsoni, D. P. Shankweiler, 

M. Studdert-Kennedy, and J. E. Cutting. Haskins Laboratories* Workshop on 
Speech Perception, Wallingford, Conn., 5-6 June 1973. 

Can the Pigeon See Sideways — A Study of the Visual System. Pat Nye. Yale 

University, New Haven, Conn., Summer Seminar on Image Processing, 13 June 
1973. 



Dissertations 

^Levels of Processing in Phonological Fusion. James Eric Cutting. Ph.D. disser- 
tation, Yale University, New Haven, Conn., 1973. 
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